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OPENING  SESSION  OF  THE  27TH  ANNUAL 
MILITARY  TESTING  ASSOCIATION  CONFERENCE 

21  October  1985 


The  27th  Annual  Conference  of  the  Military  Testing  Association  was  hosted  by  the 
Navy  Personnel  Research  and  Development  Center  (N'PRDC).  The  Conference  was  held 
at  the  Bahia  Hotel  in  San  Diego,  California,  21  through  25  October  1985.  A  total 
of  169  paper  and  symposium  presentations  were  given  during  57  sessions 
structured  into  three  concurrent  tracks.  Conference  attendance  was  326. 


CALL  TO  ORDER:  Dr.  Martin  F.  Wiskoff,  Head,  Manpower  and  Personnel  Laboratory, 
NPRDC,  and  MTA  Chairman  called  the  Conference  to  order  at  1300,  21  October.  Dr. 
Wiskoff  then  introduced  the  MTA  President,  CAPT  Howard  S.  Eldredge,  Commanding 
Officer,  NPRDC. 


WELCOME:  CAPT  Eldredge  officially  welcomed  the  attendees  to  the  Conference  and 
San  Diego.  He  discussed  the  importance  of  personnel  systems  to  missions  of  the 
armed  forces  and  emphasized  how  they  represent  the  key  to  military  superiority. 
CAPT  Eldredge  then  expressed  the  hope  that  the  formal  presentations  and  the 
informal  interchanges  the  attendees  would  experience  during  the  week  would  go 
far  to  further  personnel  research  and  development. 


KEYNOTE:  Prior  to  the  introduction  of  the  keynote  speaker,  Brigadier  General 
Caleb  J.  Archer,  Commander,  U.S.  Military  Entrance  Processing  Command,  Dr. 
Wiskoff  gave  a  brief  summary  of  the  General's  career.  Dr.  Wiskoff  noted  that 
General  Archer  has  held  a  wide  variety  of  important  command  and  staff  positions. 
In  1966  he  served  in  Vietnam  commanding  the  212th  Military  Police  Company  and 
later  was  Chief  of  Physical  Security  for  the  U.S.  Army  Vietnam.  General  Archer 
also  served  on  the  faculty  of  the  Army  Command  and  General  Staff  College  and  has 
served  as  Provost  Marshal  of  the  3rd  Infantry  Division  in  Wurzburg,  Germany  and 
Commander  of  the  793rd  Military  Police  Battalion,  Nuremburg,  Germany.  General 
Archer  has  also  served  in  Washington,  DC  as  the  Army  Deputy  Chief  of  Staff  for 
Personnel.  He  has  been  the  Provost  Marshal  of  the  Army  Field  Artillery  Center 
at  Fort  Sill,  Oklahoma.  His  most  recent  assignments  have  included  Commander, 
Western  Region  Recruiting  Command  at  the  Presidio  of  San  Francisco,  Commandant 
of  the  Army  Military  Police  School  and  Deputy  Commanding  General  of  the  Army 
Military  Police  and  Chemical  Training  Centers,  Fort  McClellan,  Alabama 


Dr.  Wiskoff  then  asked  the  attendees  to  give  a  warm  welcome  to  General  Archer. 
(General  Archer's  address  follows  in  its  entirety.) 
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THE  U.  S.  MILITARY  ENTRANCE  PROCESSING  COMMAND  AND  ITS  TESTING 

PROGRAM  FOR  THE  AIDS  VIRUS 


§ 


Brigadier  General  Caleb  J.  Archer 
U.  S.  Military  Entrance  Processing 
Command 


CAPT  Eldredge,  Dr.  Lancaster,  Dr.  Wiskoff,  ladies  and  gentlemen. 

It  is  a  distinct  privilege  and  honor  for  me  to  be  invited  to  speak 
to  this  group  of  researchers  and  professionals.  Although  I've  only 
been  the  Commander  of  the  U.  S.  Military  Entrance  Processing 
Command  (USMEPCOM)  for  three  months,  I've  seen  already  what  an 
important  part  research  plays  in  our  command  and,  in  fact,  the 
whole  Armed  Forces.  Your  efforts  in  developing  the  various  forms 
of  testing  are  very  important  to  us.  As  past  Commandant  of  the 
Military  Police  School  I  can  also  assure  you  of  the  importance  of 
other  areas  of  personnel  research  and  development  such  as  your 
efforts  involving  women  in  the  Armed  Services. 

At  this  time  we  in  USMEPCOM  are  faced  with  the  complex  issue  of  the 
Acquired  Immune  Deficiency  Syndrome  (AIDS) .  It  is  an  issue  that 
will  require  immense  effort  to  see  the  problem  to  a  successful 
solution.  We'll  need  your  help. 

What  I  would  like  to  do  today  is  first  to  tell  you  a  little  bit 
about  the  Command  and  then  to  talk  about  the  mission  we  have 
received  in  USMEPCOM  to  test  all  new  applicants  to  the  Armed  Forces 
for  the  HTLV-III  antibody  related  to  the  AIDS  virus. 

USMEPCOM  reports  directly  to  the  Department  of  Defense  through  Dr. 
Steve  Sellman  and  Dr.  Lancaster's  office,  and  up  to  LTG  Chavarrie. 
We  have  a  very  direct  route  through  Headquarters  DCSPER  because 
they  are  the  executive  agent  for  processing  for  the  Armed  Forces. 
Our  biggest  customers  are  the  four  recruiting  commands  -  Army, 

Navy,  Air  Force,  and  Marine  Corps  -  and  the  fourteen  training 
centers  -  eight  Army,  three  Navy,  1  Air  Force,  and  2  Marine  Corps. 
We  qualify  applicants  for  this  complex  system. 

We  have  68  Military  Entrance  Processing  Stations  (MEPS)  throughout 
the  U.  S.  and  overseas  and  two  substations  -  one  in  Guam  and  one  in 
Alaska.  This  structure  is  broken  down  into  three  sectors  -  the 
Western  Sector  under  Marine  Corps  COL  Bill  Stroup,  headquartered  in 
San  Francisco;  the  Central  Sector  under  Army  COL  James  Tyler, 
headquartered  in  Chicago;  and  the  Eastern  Sector  under  Air  Force 
COL  Linda  Sendt  at  Fort  Meade,  Maryland.  We  are  a  joint  staff 
agency  with  about  49%,  Army,  21%  Navy,  19%  Air  Force,  and  11% 

Marine  Corps.  The  service  affiliations  of  our  commanders  reflect 
these  service  percentages.  However,  we  have  24  female  commanders 
out  of  a  total  of  68  because  all  of  our  Navy  commanders  are  female. 
This  is  because  the  Navy  is  seeking  command  billets  for  this  level 
of  female  officer  and,  fortunately  for  us,  we  are  able  to  take 
advantage  of  the  situation. 
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Our  screening  process  is  based  on  the  Armed  Forces  Vocational 
Aptitude  Battery  (ASVAB) .  We  administer  the  ASVAB  to  about  one 
million  applicants  at  our  testing  sites  each  year  and  we  administer 
an  additional  million  in  high  schools.  So  we  give  the  test  about 
two  million  times  each  year. 

We  give  about  540,000  physical  exams  at  our  stations  a  year  but 
next  year  that  will  rise  to  700,000  because  we  are  taking  on  the 
responsibility  of  examining  for  the  Army  National  Guard  starting  1 
December. 

In  addition  to  testing,  USMEPCOM  interviews  applicants  to  detect 
law  violations  and  other  disqualifying  past  conduct. 

Out  of  the  total,  the  selection  process  pares  the  number  down  to 
about  380,000  that  actually  join  the  Armed  Forces  through  the 
Military  Entrance  Processing  Stations  (MEPS)  per  year.  All  but 
about  5%  of  the  total  enter  the  Delayed  Entry  Program.  This 
program  allows  the  participants  to  delay  entrance  for  up  to  a  year. 
Currently  the  program  has  about  130,000  in  it  but  this  number 
varies  up  to  about  150,000.  So  we  are  always  working  with  the 
future  in  mind  in  our  enlistment  process. 

For  our  production  tests,  the  ones  we  give  to  the  million 
applicants,  we  use  the  ASVAB  11,  12,  and  13  forms  which  many  of  you 
have  worked  on.  For  the  school  test,  administered  in  the  high 
schools  and  junior  colleges,  we  use  ASVAB  14.  With  the  help  of  Dr. 
Lancaster  and  others,  we  have  just  got  testing  specialists  in  all 
81  of  our  stations.  These  specialists  help  market  ASVAB  to  the 
schools.  We  market  it  as  the  excellent  counseling  tool  it  is  but, 
of  course,  it  also  has  great  value  to  the  recruiters.  It  allows 
the  recruiters  to  direct  their  attention  to  the  high  quality 
youngsters  the  military  needs  to  enlist. 

We  have  some  difficulty  getting  into  some  of  the  high  schools  to 
test.  We  give  the  ASVAB  in  about  79%  of  the  schools  but  some  of 
them  test  only  three  to  five  examinees  so  the  totals  don't  add  up 
very  fast.  We  only  test  about  13%  of  the  available  high  school 
juniors  and  seniors  in  the  country.  Endorsements  from  state 
agencies,  city  officials,  and  school  administrators  help  the 
program,  but  the  strong  feeling  that  any  outside  activity  detracts 
from  the  education  process  plays  against  our  initiatives. 

We  also  administer  special  purpose  tests  at  the  time  of  enlistment 
for  the  services  -  tests  which  many  of  you  helped  develop  -  and  my 
staff  and  I  participate  in  the  planning  of  Department  of  Defense 
research. 

I  now  want  to  give  you  an  example  of  the  kind  of  research  most  of 
our  commanders  grasp  at  —  specifically  that  which  gives  the 
commander  a  product  in  the  near  term.  When  we  marketed  the  ASVAB 
to  the  schools  in  July  1984,  we  thought  the  test  would  take  only 
three  hours  to  administer  but  with  the  inclusion  of  the 
instructions  the  time  was  increased  to  about  three  hours  and  twenty 
minutes.  Now  this  does  not  sound  like  much  of  a  problem  but  all 
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the  testing  periods  had  been  blocked  into  three  hours.  The  school 
bells  rang  and  students  got  up  and  wandered  off.  We  had  many 
incompleted  tests  and  test  security  was  compromised.  Some  of  the 
schools  quit  the  program  and  the  recruiters  were  becoming  upset. 

A  research  program  was  quickly  implemented  by  the  Command  with  Navy 
trainees  to  see  if  shortened  instructions  would  still  be  effective. 
It  was  determined  that  both  an  experimental  and  control  group  did 
equally  as  well  and  that  shortening  the  instructions  would  have  no 
impact  on  test  validity,  so  we  were  able  to  shorten  the  test  back 
to  three  hours  and  continue  with  the  high  school  testing  program. 

Another  example  of  research  with  great  value  to  USMEPCOM  is  the 
Computerized  Adaptive  Testing  -  Armed  Services  Vocational  Aptitude 
Battery  (CAT-ASVAB)  program  which  will  be  discussed  thoroughly  this 
week.  CAT  is  an  example  of  basic  research  which  has  matured  to  the 
practical  level;  we  will  be  putting  it  to  use  in  the  near  future. 

I  won't  cover  the  CAT-ASVAB  milestones  since  they  will  be  discussed 
in  depth  during  the  week's  sessions.  However,  the  one  I  am 
interested  in  is  when  we  declare  victory  and  actually  start  CAT- 
ASVAB  testing  in  the  MEPS.  In  the  meantime  it  is  going  to  be  a 
burden  to  get  CAT-ASVAB  evaluated,  especially  for  the  recruiters 
who  must  bring  in  applicants  and  tell  them  they  will  have  to  take  6 
hours  of  tests.  We  are  going  to  pay  the  price  to  stay  on  schedule. 
Already  we  are  getting  resistance  from  the  recruiter  force  but  CAT- 
ASVAB  will  be  operationalized. 

What  I  would  like  to  do  now  is  to  switch  to  a  discussion  of  our 
effort  to  test  for  the  AIDS  virus.  Most  of  us  believed  up  until  a 
few  months  ago  that  this  was  a  disease  that  affected  homosexual 
men,  intravenous  drug  users,  and,  tragically,  hemophiliac  patients 
but  did  not  have  much  impact  on  the  population  in  general.  But  I 
can  tell  you  now  that  an  estimated  1.2  million  Americans  are 
currently  carrying  the  antibody.  Occurrence  of  the  disease  is 
estimated  to  be  doubling  every  year.  Young  people  at  enlistment 
age  will  be  the  most  affected.  In  the  past  members  of  the  Armed 
Forces  have  been  six  times  as  likely  as  the  general  population  to 
contract  venereal  disease.  If  this  trend  holds  for  AIDS  we  as 
researchers  and  leaders  must  be  very  concerned  about  this  disease 
and  its  potential  for  disruption  of  the  Armed  Services. 

As  most  of  you  know  the  Department  of  Defense  made  a  decision  on  30 
August  to  test  all  new  applicants  for  the  HTLV-III  antibody.  A 
positive  test  result  indicates  that  the  individual  has  come  in 
contact  with  the  AIDS  virus  some  time  in  the  past.  It  does  not 
indicate  that  the  individual  has  the  virus  now  nor  does  it  mean 
that  the  individual  has  the  disease  now.  However,  recent  tests 
show  that  a  very  high  percentage  of  people  testing  positive  for  the 
antibody  do  carry  the  virus  at  that  time. 

The  testing  program  has  already  begun.  It  started  on  10  October  in 
the  training  centers  and  on  15  October  in  the  MEPS.  Based  on  this 
test  we  will  deny  enlistment  to  applicants  who  test  positive  to  the 
antibody. 
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In  implementing  this  program  I  wanted  to  maintain  the  current 
processing  flow  because  it  is  critical  to  manning  the  force.  We 
wanted  to  minimize  the  impact  on  shipping.  Actually,  as  I 
mentioned,  we  don't  have  many  straight  shippers.  Almost  everybody 
we  enlist  goes  into  the  Delayed  Entry  Program  or  back  to  Reserve  or 
National  Guard  units.  Only  about  5%  are  straight  shippers.  They 
can't  ship  any  longer.  They  must  wait  at  least  one  day  for  the 
test  results  to  come  back.  We  wanted  to  be  very  accurate  in  the 
preparation  of  our  specimens  with  all  the  traffic  involved,  we  had 
some  difficulty  in  testing  initially  but  hopefully  that  has  been 
corrected. 

Applicants  to  the  Delayed  Entry  Program  are  currently  being  tested 
at  the  training  centers  in  order  to  allow  the  MEPS  to  continue 
their  normal  processing  flow.  However,  by  1  October  the  MEPS  will 
take  over  the  responsibility  for  all  AIDS  testing  except  for  active 
duty  personnel.  As  you  may  know  the  plan  is  to  begin  testing  the 
active  duty  force  in  the  future. 

Finally  we  wanted  to  be  very  careful  in  notifying  the  applicants 
that  they  were  carrying  the  antibody.  We  decided  to  return  them  to 
the  MEPS  and  tell  them  face  to  face  as  opposed  to  sending  them  a 
registered  letter  which  we  studied  as  an  alternative. 

To  develop  our  plan  we  worked  with  a  number  of  the  MEPS  commanders 
with  advice  from  the  service  recruiting  commands,  we  could  have 
gone  with  a  decentralized  system  very  easily  but  the  cost  would 
have  run  to  $16  or  $18  per  test.  This  high  cost  and  the  concern 
over  quality  control  forced  us  to  a  national  contract.  One  lab 
which  would  pick  up  nationwide  and  overseas  and  do  the  lab  tests  in 
one  location.  This  would  allow  Walter  Reed  and  other  institutions 
to  perform  very  specific  quality  control  procedures. 

We  then  had  all  of  our  medical  Noncomissioned  Officers  in  Charge 
(NCOICs)  come  into  Chicago  so  we  could  train  them  very  carefully. 

We  also  brought  in  all  our  medical  officers  -  each  of  the  70  MEPS 
has  one  full  time  chief  medical  officer.  Some  of  the  larger  ones 
have  two.  These  are  augmented  with  400  fee  based  physicians.  The 
average  age  of  medical  officers  is  60.  We  had  four  over  the  age  of 
80.  Just  getting  them  to  Chicago  was  an  effort.  But  they  really 
were  enthusiastic  about  the  program.  The  medical  officers  and  the 
MEPS  commanders  have  given  the  program  their  full  support. 

USMEPCOM  began  testing  15  October;  the  Navy,  Air  Force  and  Marine 
training  centers  started  1  October;  and  the  Army  training  centers 
started  10  October.  We  had  already  completed  over  17,000  tests  by 
last  Friday  and  we  are  moving  forward. 

How  many  positives  are  out  there?  It  is  emerging  data;  I  don't 
know  yet.  I  know  the  studies  we  did  at  Fort  Benning  showed  8 
positives  per  1000  for  1,400  tested.  We  don't  know  what  the 
numbers  are  going  to  be  in  this  program  but  we  hope  it  is  going  to 
be  less  than  8  per  1,000. 
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The  contract  went  to  Damon  Corporation  Medical  Labs  in  Needham,  MD. 
The  cost  was  $4.41  per  test  as  opposed  to  the  average  of  $16  to  $18 
for  decentralized  testing.  All  tests  will  go  to  Dallas  by 
overnight  express  and  the  results  electronically  relayed  to  us  each 
day  before  two  o’clock.  We  test  by  doing  the  ELISA  Screening  Test 
twice.  If  the  specimen  is  tested  positive  twice  it  is  sent  to 
Chicago  for  the  more  specific  Western  Blot  Confirmatory  Test.  And 
if  this  test  proves  positive  only  then  do  we  call  the  individual  in 
to  inform  him  or  her  of  the  results. 

With  the  ELISA  Screening  Test  we  get  the  negatives  back  in  one  day. 
So  if  we  pick  up  by  noon,  by  noon  the  following  day  the  results  are 
back.  This  allows  us  to  ship  all  our  applicants  except  those  who 
were  ELISA  positive.  It  takes  three  days  to  get  the  results  of  the 
Western  Blot  Confirmatory  Tests  back  to  the  MEPS. 

Issues  of  confidentiality  are  obviously  a  great  concern  for  us.  At 
the  time  the  individual  comes  in  for  the  physical  exam  we  tell  them 
that  they  are  going  to  be  tested  for  the  HTLV-III  antibody.  They 
sign  an  acknowledgement  form  which  amounts  to  an  informed  consent 
so  it  is  not  a  complete  surprise  if  they  are  called  back  with  a 
positive  result. 

The  release  of  the  information  will  be  a  problem.  We  are  handling 
it  through  the  Office  of  the  Secretary  of  Defense  General  Counsel. 
All  requests  are  being  sent  there  for  disposition.  Specific 
requests  may  go  through  Central  Disease  Control  in  Atlanta.  We 
don't  know  at  this  time.  We  won’t  be  indiscriminately  releasing 
any  information  concerning  individuals. 

The  notification  process  entails  sending  a  letter  to  the  individual 
to  return  to  the  MEPS  about  a  medical  problem.  The  recruiter  who 
made  the  initial  contact  with  the  individual  and  his  or  her  family 
will  go  to  the  home  and  pick  up  the  applicant  and  return  with  the 
applicant  to  the  MEPS.  The  individual  will  be  informed  of  the 
positive  results  by  the  chief  medical  officer  with  the  commander 
present.  After  notification  a  second  blood  sample  will  be  drawn. 

We  are  doing  that  to  be  100%  positive.  The  individual  will  then  be 
returned  home  by  the  recruiter.  We  will  inform  the  individual  of 
the  results  of  the  second  test  by  registered  mail,  return  receipt 
requested,  or  by  telephone;  probably  both.  If  the  second  test 
comes  back  negative  -  we  don't  expect  to  have  any  of  these  -  we 
have  to  research  every  aspect  of  the  case  to  find  out  why. 

A  second  letter  is  actually  handed  to  the  individual  after  the 
interview  telling  him  or  her  that  they  have  the  antibody,  but  it 
doesn't  mean  they  have  AIDS;  that  they  may  or  may  not  contract  the 
disease;  and  that  they  should  see  their  private  physician  to  seek 
further  advice  about  their  health. 

We  also  have  a  fact  sheet  developed  at  our  request  by  the  doctors 
at  Walter  Reed.  The  fact  sheet  is  very  good  and  is  used  by  own 
people  to  overcome  the  concern  they  have  about  the  disease.  Put 
yourself  in  the  place  of  the  recruiter  who  has  to  pick  up  an 
applicant  and  then  come  maybe  three  hundred  miles  in  six  hours  to 
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the  MEPS  and  then  return  home  with  the  individual.  This  is  a  very 
difficult  task.  The  fact  sheet  helps  calm  the  fears  of  all 
involved. 

We  have  also  developed  a  workshop  program  to  give  the  recruiter 
training  in  how  to  handle  these  applicants.  In  developing  this 
training  we  used  social  workers,  psychiatrists,  and  suicide 
counselors.  They  came  up  with  some  do's  and  don*ts  to  reinforce 
and  give  confidence  to  the  recruiters  before  they  have  to  deal  with 
this  situation.  We  have  included  in  this  training  a  list  of 
questions  and  answers  which  address  possible  problems  the  recruiter 
could  encounter  while  bringing  in  an  applicant. 

We  have  about  7,000  recruiters  and  all  of  them  will  be  faced  with 
transporting  a  ELISA  positive  applicant  at  one  time  or  another.  We 
are  taking  this  workshop  to  the  recruiting  schools  where  it  will 
become  part  of  their  training. 

This  is  the  end  of  my  prepared  remarks.  Thank  you  for  your  time 
and  attention. 


Buie  Space*  A  Model  for  Identifying 
Erroneous  Rules  of  Operation 
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Int-r-oduction.  It  is  well  known  that  a  student's  total  score  on  a  test  does 
not  tell  the  whole  story— in  fact  it  often  tells  very  little— about  the 
student's  achievement  level  and  even  less  about  the  kinds  of  incorrect  notions 
he  or  she  has  about  the  subject  matter  being  tested.  To  get  something  close 
to  the  whole  picture,  we  need  to  examine  the  student's  response  pattern,  which 
is  a  vector  of  l's  and  0's,  representing  right  and  wrong,  respectively,  on  the 
successive  items,  like  Cl, 0,0, 1,1,1, 1,0, ... ,0,1,1]  • 

However,  since  the  vector  will  have  as  many  elements  as  there  are  items 
on  the  test,  it  is  clear  that  we'd  be  hard  put  to  make  much  sense  out  of  it 
unless  the  test  is  quite  short.  What  is  needed,  therefore,  is  some  way  to 
summarize  the  information  contained  in  a  response-pattern  vector  by  means  of 
some  kind  of  numerical  index. 

Suppose  that  an  achievement  test  has  been  given  to  a  sizable  group  of 
students,  and  the  facility  level  of  each  item  (i.e.,  the  percentage  of  the 
group  that  got  each  item  right)  has  been  determined.  Suppose,  further,  that 
the  items  have  been  rearranged  from  the  easiest  to  the  hardest  in  recording 
the  l's  and  0's  (for  right  and  wrong)  in  the  successive  response-pattern 
vectors.  What  sort  of  vectors  would  we  expect  to  find  predominantly  in  the 
set  of  vectors  for  the  whole  group?  From  the  way  the  items  have  been 
arranged,  we  should  find  most  vectors  to  have  more  l's  toward  the  left  and 
more  0's  toward  the  right-hand  part — representing  a  pattern  of  responses  in 
which  more  of  the  easier  items  were  passed  and  more  of  the  harder  ones  were 
failed.  The  "ideal"  pattern  would  be  one  in  which  all  the  l's  precede  all  the 
0's.  Such  a  vector  may  be  called  a  "Guttman  vector,"  in  analogy  with  what  are 
called  Guttman  scales  in  attitude  scaling.  Of  course,  Guttman  vectors  would 
be  found  rarely  if  at  all  in  any  set  of  real  response-pattern  vectors.  Most 
of  the  vectors  would  show  some  0's  interspersed  among  the  predominant  l's  in 
the  left  part  and  some  l's  among  the  0's  in  the  right-hand  part. 

In  several  vectors,  we  may  find  a  rather  random  assortment  of  l's  and 
0's  throughout;  in  a  few,  we  may  even  find  a  predominance  of  0's  occupying  the 
left  part  (corresponding  to  the  easier  items)  and  more  l's  in  the  right-hand 
(i.e.,  the  harder-i terns)  part.  Both  these  types  of  response  vectors  would 
have  to  be  regarded  as  anomalous  or  "aberrant,"  because  the  first  type  shows 
just  about  the  same  proportions  of  the  easier  and  the  harder  items  being 
passed,  while  in  the  second  type  more  of  the  harder  items  are  passed  than  the 
easier  items,  Thus  an  index  that  shows  the  extent  to  which  a  given  response 
vector  approaches  a  Guttman  vector  or  the  opposite  extreme  in  which  all  the 
0's  precede  all  the  l's  (a  "reverse  Guttman  vector")  could  serve  as  a  measure 
of  the  "typicality"  or  "atypicality"  of  that  vector.  Such  an  index  was,  in 
fact,  developed  at  an  early  phase  of  our  project,  and  it  was  called  the  "Norm 
Conformity  Index"  (NCI),  However,  this  index  has  the  defect  that  it  defines 
"typicality"  with  reference  to  an  "ideal"  standard  or  norm  that  is  virtually 
unrealized  in  practice. 

Extended  Caution  Indices.  The  above  circumstance  led  us  (mainly  Kikumi 
Tatsuoka)  to  seek  a  measure  of  typicality/atypicality  that  is  based  on  a 
probabilistic  model,  namely  Item  Response  Theory  (IRT),  In  highly  simplified 
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terns,  what  our  model  does  is  fivst  ustlPT  with  the  one-  or  two-parameter 
logistic  function  to  estimate  the  student  parameter  Q  and  the  item  parameter  a 
(or  parameters  a  and  b)  for  each  student  and  item  in  a  data  matrix.  Then,  for 
each  student,  the  IPT-based  probability  for  passing  each  item  is  computed,  and 
a  matrix  (Pji(§i))  *s  constructed.  This  could  be  called  an  IRT-based  theoreti¬ 
cal  data  matrix,  with  the  0-1  entries  replaced  by  Pij(©i)*s,  Finally,  an  index 
analogous  to  the  NCI  is  computed  for  each  student,  representing  the  extent  to 
which  his/har  observed  response  vector  deviates  from  the  corresponding  vector 
of  probabilities  based  on  IPT,  relative  to  the  extent  to  which  the  group's 
average  response  vector  deviates  from  that  IRT- probability  vector. 

Actually,  several  such  indices,  called  "Extended  Caution  Indices"  (ECl) 
can  be  defined,  depending  on  just  what  IRT-based  quantities  are  used  to  replace 
the  l’s  and  0's  of  the  Guttman  vectors  and  their  means  over  students.  Five 
such  ECI's  were  defined  and  discussed  by  K,  Tatsuoka  and  Linn  (1983)® 

It  turned  out  that  the  one  that  was  called  ECI4,  when  standardized,  served 
the  best  as  an  index  for  detecting  anomalous  response  patterns  in  a  group, 

(The  standardization  is  for  the  purpose  of  making  the  values  comparable  over 
different  0's.)  This  was  denoted  by  and  has  been  used  exclusively  in  all  our 
subsequent  work.  It  also  has  the  convenient  property  that,  prior  to  being 
standardized,  it  is  interPretable  as  a  linear  mapping  function: 
f(x)  =  '(P(0)  -  x)'(P(0)  -  T(0)) 

-  SjSi  (Pj(e)  -  x -i)(p j(@)  -  t(o)) 

-  -(p(e)  -  T(0))’x+  (p(0)  -  T(0))'P(o), 

ys/  <v  »v  ^  ^  . 

which  associates  with  each  response  pattern  x  a  real  number  f(x).  Here 
T(0)’  =  CT(0),  T(0),  ...,  T(0)]' 

is  a  vector  whose  n  elements  are  all  equal  to 
T(0)  «=  (l/n)S^:Pj(0)  , 

which  is  the  mean,  over  items,  of  the  IPT-based  response  probabilities  for  a 
fixed  0  . 

The  expected  value  and  variance  of  f(x)  for  fixed  G,  denoted  f9(x),  was 
shown  by  K,  Tatsuoka  (1985)  to  be 

E(f9(x)|0)  =  0 

and  ~ 

Var(fe(x)|0)  =  Ej=iPcj(0)Qj(9)[Pj(0)  -  T(0)]2 

respectively. 


Rule  Space.  With  the  standardized  mapping  function 
>  =  f9(x)/[Var(fQ(x)|0)]l/2 


defined  above,  we  can  now  map  each  student's  response  pattern  x  into  a  point 
(0,5)  in  a  two-dimensional  space  with  abscissa  0  and^ordinate  <> .  It  was 
shown  that  the  maximum  likelihood  estimate  of  0,  MLE  0,  and  5>  are  uncorrelated. 


This  space  is  called  "Rule  Space,"  because — if  a  student  consistently  uses 
some  specific  rule  of  operation  (or  algorithm)  in  solving  all  the  items  on  a 
test — each  rule  R  will  yield  a  unique  response  pattern  Xp  (or  just  JR,  for  short). 
Thus  the  correct  rule  will  yield  the  response  pattern  [l,l,l,...,l],  and  each 
incorrect  rule  will  yield  some  specific  permutation  of  1's  and  0's  (including 
all  0's,  of  course). 


^Consequently,  the  rules  that  yield  response  patterns  leading  to  the  same 
MLE  0  will  all  be  mapped  into  points  that  lie  on  a  straight  line  perpendicular 
to  the  0  axis,  (Such  response  patterns  will  have  the  same  number,  Exj  ,  of  1's 
in  the  case  of  the  one-parameter  logistic  model,  and  the  value  of  the  sufficient, 
statistic,  Eajxj  ,  will  be  equal  in  the  case  of  the  two-parameter  model.) 


Thus,  what  does  is  to  pull  apart  students  who  have  the  same  total  score  (or 
the  same  sufficient  statistic  for  8)  on  a  test.  Students  whose  response 
patterns  are  typical  of  their  group  (and  hence  whose  5?  values  are  small)  will 
be  represented  by  points  close  to  the  8  axis,  while  those  whose  response 
patterns  are  unusuaj  for  the  group  (large  values)  will  be  mapped  into  points 
that  are  above  the  8  axis — the  more  unusual  their  pattern,  the  farther  up 
their  points  will  be. 

But  now  comes  the  rub.  The  neat  mapping  indicated  above  was  predicated 
on  the  assumption  that  each  student  uses  some  specific  rule  consistently 
throughout  the  test.  But  of  course  this  will  not  be  true  in  practice.  There 
are  bound  to  be  some  random  departures  from  the  use  of  a  constant  rule  in  a 
few  of  the  items.  Then  the  student's  response  pattern  will  not  be  mapped  into 
a  point  that  corresponds  to  consistent  use  of  a  single  rule.  Rather,  the  point 
will  fall  in  the  vicinity  of  the  point  for  the  rule  that  has  been  used  for 
solving  most  of  the  items.  These  may  be  called  "perturbations"  from  the  "pure" 
rule  point.  Or,  if  we  want  to  be  more  specific  about  the  extent  of  perturba¬ 
tion — i.e.,  the  number  of  items  on  which  there  was  a  departure  from  the  modal 
rule — we  may  call  the  points  "one-slip  points,"  "two-slip  points,"  and  so  on, 
Tatsuoka  and  Tatsuoka  (submitted  to  Fsychometrlka)  have  shown  that  the  proba¬ 
bility  of  there  being  no  more  than  some  number  s  (<n)  of  slips  in  a  test  with 
n  items  is  given  by  a  compound  binomial  distribution.  Hence,  asymptotically, 
the  perturbed  points  will  be  distributed  normally  around  the  pure  rule  point 
from  which  they  are  perturbations.  Moreover,  under  reasonable  assumptions, 
the  variance  of  this  distribution  will — at  least  for  rule  points  that  are  not 
too  far  from  the  8  axis — be  equal  to  the  variance  that  was  previously  displayed 
for  fg(x),  namely,  £j!l]PjQj[Pj(e)  ~  T(o)]  ,  This,  along  with  the  facts  that 
0  is  normally  distributed  with  mean  8  and  variance  l/l(e)  and  that  and  *0  are 
uncorrelated,  allows  us  to  conclude  that  the  perturbed  points  will  follow  a 
bivariate  normal  distribution  with  a  known  centroid  and  a  known,  diagonal 
covariance  matrix.  Consequently,  the  points  corresponding  to  response  patterns 
with  no  more  than  a  certain  number  of  slips  will  lie  inside  ellipses  whose 
minor  and  major  axes  are  parallel  to  the  reference  axes  of  rule  space.  These 
ellipses  will  constitute  various  iso-density  ellipses  of  the  bivariate  normal 
distributions  around  the  pure  rule  point.  The  upshot  is  that,  if  all  the  rule 
points  and  the  observed  response  points  that  are  perturbations  from  them  were 
to  be  plotted  in  rule  space,  the  result  would  be  something  like  this*  There 
would  be  several  swarms  of  points,  each  of  which  would  be  most  densely  concen¬ 
trated  around  one  of  the  pure  rule  points,  and  would  become  sparser  and  sparser 
as  we  go  farther  from  the  center  (i.e.,  the  rule  point)  in  any  direction. 

We  can  now  explicate  just  what  is  meant  by  the  title  of  this  paper, 

"a  model  for  identifying  erroneous  rules  of  operation,"  as  follows*  It  is  a 
technique  whereby,  given  a  student's  point  (8if  ^)  in  rule  space,  we  are 
enabled  to  decide  to  which  one  of  the  ellipses  this  point  most  likely  belongs. 
This  is  because  we  have,  through  this  model,  translated  our  original  problem 
into  a  typical  problem  of  statistical  decision  theory*  the  problem  of  classify¬ 
ing  a  given  point  into  one  of  several  bivariate  normal  populations — or,  more 
generally  (as  we  shall  soon  see)  into  one  of  several  2m-variate  normal 
populations. 

One  of  the  ways  for  solving  such  a  problem  is  to  invoke  the  "minimum-D^ 
rule,"  where  is  the  (squared)  Mahalanobis  generalized  distance.  Without 
loss  of  generality,  we  may  assume  that  we  have  eliminated  all  but  two  rule 
points  R^  an^  S2  as  caudi1ates  for  the  rule  point  from  which  the  given  student's 
resporse  point  is  a  perturbation.  If  we  denote  the  common  covariance  matrix  of 


the  two  bivariate  normal  distributions  with  R^  and  R 2  as  their  centers 
by  £»  then  the  D2  of  the  given  response  point  from  the  centroid  _R{<  of  the 
®ule  k  ellipse  is  ^ 

D^xk  “  CCOxt  5X)  “  2k'  £  “ -Sk^  “  ^,2) 

Upon  computing  the’  two  D2s,  we  would  classify  x  as  a  perturbation  from  Rj  if 
<  D2^  otherwise  as  a  perturbation  from 
~  Once  we  have  decided  what  particular  rule  it  is  that  the  student’s 
response  pattern  is  most  likely  to  be  a  perturbation  from  (i.e. ,  what  rule 
he/she  most  likely  used  almost  consistently  except  for  a  few  "slips'*),  it  is  a 
short  step  to  diagnosing  the  particular  misconception  that  was  most  likely 
held  by  the  student  for  him/her  to  have  adopted  that  rule  in  the  first  place. 
This  is  because  the  rules  were  originally  inferred  from  a  careful  error 
analysis  of  actual  test  pape-^s  by  experienced  subject-matter  specialists  of 
the  material  covered  by  the  test. 


Application  to  a  Test  on  Subtraction  of  Signed  Numbers.  Suppose  that  a  test 
consisting  of  the  ten  items  listed  in  the  first  column  of  Table  1  (without  the 
answers,  of  course)  was  given  to  a  group  of  seventh-graders,  and  that  four  of 
the  students  each  used  one  of  the  four  erroneous  rules  described  at  the  bottom 
of  the  table.  The  four  pairs  of  columns— each  pair  headed  by  a  rule  number- 
show  the  answers  obtained  by  using  the  four  rules,  respectively,  and  the 
binary  score  (l  or  0)  for  each  item. 
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Table  1.  The  Binary  Response  Vectors  for  a  Sat  of  Ten  Items  Baa  ponded  to  by 
Four  Incorrect  Rules, 


Items 

Rule  1 
Response 

*i* 

Rule 

Response 

2 

xi 

Rule  3 
Response 

xi 

Rule  4 
Response 

Xi 

-3  -  (-7)  -  +4 

-4 

0 

+4 

1 

+4 

1 

+4 

1 

-2  -  8  -  -10 

+6 

0 

+6 

0 

+6 

0 

-10 

1 

5  -  (-12)  -  +17 

-7 

0 

+7 

0 

+17 

1 

-7 

0 

-11  -  +8  -  -19 

-3 

0 

+3 

0 

-19 

1 

-19 

1 

9  -  4  -  +5 

+5 

1 

+5 

1 

+5 

1 

+5 

1 

-15  -  (-9)  -  -6 

-6 

1 

+6 

0 

-6 

1 

-6 

1 

-13  -  5  -  -18 

-8 

0 

+8 

0 

-8 

0 

-18 

1 

8  -  (-.6)  -  +14 

+2 

0 

+2 

0 

+14 

1 

+2 

0 

-5  -  +11  -  -16 

+6 

0 

+6 

0 

-16 

1 

-16 

1 

1  -  10  -  -9 

+9 

0 

+9 

0 

-9 

1 

-9 

1 

Rule  1  1  The  student  subtracts  the 
smaller  absolute  values  from  the  larger 
absoluto  value  and  takes  the  sign  of  the 
number  with  the  larger  absolute  value 
in  his/^ier  answer. 

Rule  2:  The  two  numbers  are  always 
subtracted  as  seen  in  Rule  1  but-  the 
+  sign  la  always  taken  in  the  answera 


*  Xi  is  the  score  for  the  Jth  item 
in  the  binary  response  vector  x. 


Rule  3>  The  atudent  converts  -2-8 
and  -13  -  5  into  -2  +  8  and  -13  +  5, 
respectively,  but  the  other  eight  items 
items  arc  converted  to  addition 
correctly.  Then  the  right  addition 
rule  is  used  to  answer  them. 

Rule  4:  The  atudent  has  a  strange 
idea  about  the  parentheses.  Converts 
operation  sign,  -,  to  +,  first.  Then 
he/she  follows  the  rule:  if  the  signs 
of  the  two  numbers  are  minus,  then 
change  the  sign  of  the  second  number  to 
a  +;  if  the  signs  of  the  two  numbers 
are  not  alike,  then  the  sign  of  the 
second  number  becomes  a  minus. 


4 


)  hi !  (V 


We  see  from  Table  1  that  Rules  1  and  2  both  yield  correct  answers  in  two 
of  the  ten  problems,  but  not  the  same  two.  Similarly,  if  a  student  consistently 
uses  either  Rule  3  or  he/she  would  get  eight  problems  right  (again,  not  the 
same  eight).  Thus,  consistent  use  of  either  of  the  first  two  rules  or  either 
of  the  second  two  would— assuming  the  discrimination  parameters  aj  to  be  equal 
for  all  of  the  iteme— respectively  yield  the  same  estimated  0  values.  However, 
examining  the  descriptions  of  the  four  rules  at  the  bottom  of  the  table,  we 
see  Rule  2  represents  greater  ignorance  than  Rule  1,  and  Rule  3  might  be  the 
result  of  careless  slips  while  Rule  k  is  quite  a  "weird"  one.  Therefore, 
consistent  users  of  Rule  1  and  those  of  Rule  2  should  not  be  treated  alike. 

This  is  probably  true  for  the  sheer  purpose  of  guaging  their  achievement  in 
signed-number  subtraction,  and  is  certainly  true  for  the  purpose  of  diagnosing 
their  misconceptions  and  giving' them  remedial  instruction ;  similar  remarks 
hold  for  users  of  Rules  3  and  This  is  where  the  ECI 5  comes  into  play. 
Students  who  get  the  same  0  by  virtue  of  consistently  using  Rule  1  and  Rule  2, 
respectively,  would  be  pulled  apart  by  their  different  values — those  using 
Rule  2  being  plotted  higher  up  in  rule  space  (assuming  the  group  to  which  they 
belong  is  reasonably  competent).  Likewise,  those  using  Rule  k  would  come 
higher  up  than  Rule-3  users,  because  Rule  k  is  a  much  more  unusual  rule. 

Readers  who  are  interested  in  further  details  on  how"^  works  are  referred  to 
K.  Tatsuoka  (1983,  1985). 

Sometimes,  even  the  two-dimensional  rule  space  will  not  suffice  to  pull 
the  different  rule  points  apart  enough.  In  fact,  tests  of  signed-number  sub¬ 
traction  are  a  case  in  point.  In  such  cases  it  may  be  possible  to  achieve 
better  resolution,  as  it  were,  by  introducing  subscores  for  each  item;  e.g., 
a  subscore  for  the  absolute  value  of  the  answer  being  right  or  wrong,  and 
another  subscore  for  the  sign  of  the  answer.  Then  there  will  be  one  0  for 
each  "component"  and  likewise  one  ^  for  each,  thus  generating  a  rule  space  of 
four  dimensions,  or  more  generally  2m  dimensions,  when  there  are  m  components. 

Application  to  Adaptive  Testing.  The  ideas  of  rule  space  can  also  be  used  in 
speeding  up  the  "convergence"  in  computerized  adaptive  tests — i.e. ,  for  getting 
a  stable  estimate  of  a  student's  0  value  by  administering,  on  the  average,  a 
much  smaller  number  of  items  than  in  the  traditional  approach  in  which  only 
the  information  value  l(@)  is  considered  in  choosing  the  next  item.  (See  Lord, 
1980.)  In  highly  simplified  terms,  the  idea  is  to  have  at  hand — i.e.,  stored 
in  the  computer--a  curve  of  P;(0)  -  T(@)  plotted  against  T(@)  for  each  item 
in  the  pool.  These  curves  will  be  called  "Item  Disparity  Curves"  (IDC)  for 
subsequent  reference.  Then,  in  selecting  the  next  item  to  be  given  to  a 
student,  we  take  into  consideration— in  addition  to  the  l(o)— the  IDC  of  each 
item  remaining  in  the  pool,  and  choose  an  item  whose  IDC  is  as  far  removed  as 
possible  from  the  IDC  of  the  item  just  taken,  at  the  current  estimate  0(k)  of 
the  student's  0  value. 

Space  does  not  permit  indicating  the  proof  that  the  method  outlined  above 
will  indeed  speed  up  the  convergence  to  the  "correct"  0  value,  but  it  is 
soundly  based  in  functional  analysis — which  is  a  branch  of  mathematics  that 
has  many  important  applications  in  statistics  and  psychometrics  but  which,  to 
quote  Ramsay  (1980),  "...we  in  North  America  are  handicapped  by  the  fact  that 
a  course  in  [it]  is  seldom  a  part  of  the  preparation  of  an  applied  statistician" 
(1982,  p,  39*0.  A  proof  has  been  made  by  K.  Tatsuoka,  and  it  will  soon  be 
available  in  a  CERL  Research  Report  of  the  ONP-sponsored  Computerized 
Adaptive  Testing  and  Measurement  (CATM)  Project. 
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Appendix 

Proof  that  f(x)  Increases  with  Unusualness  of  Response  Pattern  x.  It  was 
stated  without  proof,  on  the  third  page  of  this  paper,  that  the  standardized 
version,  $  ,  of  f(x)  becomes  larger  as  the  response  pattern  x  becomes  more 
"unusual"  for  the  group  of  which  the  responder  is  a  member.  Ve  now  prove  this. 

Expanding  the  second  expression  for  f(x)  given  on  page  two,  we  get 

f(x)  -  =Apj(e)[Pj(e)  -  T(o)]  -  2j”1Xj[Pj(o)  -  t(q)]  , 

We  may  assume,  without  loss  of  generality,  that  the  items  have  been  numbered 
in  ascending  order  of  difficulty  so  that,  for  each  0,  we  have 

Pl(0)  >  P2(0)  >  ...  Pn(0)  . 

(The  order  may  differ  for  different  values  of  0.)  Then,  since  T(q)  is  the 
mean  of  the  n  Pj(o)  values,  there  must  be  some  k  such  that 

Pj(9)  -  T(e)  >  0  for  all  j  $  k 

and  Pj(9)  -  T(0)  <0  for  all  j  >  k . 

This,  together  with  the  fact  that  the  first  sum  in  the  above  expression  for 

f(x)  is  a  constant  for  any  fixed  0,  implies  that  for  any  response  pattern  that 
has  a  small  number  of  Xj  =  1  for  j  $  k  (i.e.,  the  easier  items)  and  a  large 
number  of  xj  *=  1  for  j  >  k  (the  harder  items),  the  quantity  f(x)  will  have  a 
larger  numerical  value  than  for  a  response  pattern  for  which  the  opposite  is 
true.  But  the  first-mentioned  type  of  response  pattern  (with  a  small  number 
of  corrects  among  the  easier  items  and  a  large  number  among  the  harder  items) 

is  clearly  an  "unusual"  or  atypical  one.  This  shows  that,  for  a  fixed  0  ,  the 

larger  f(x)  is,  the  more  unusual  is  the  response  pattern  xy- for  the  group  in 
which  the  items  were  calibrated.  The  standardization  to  get  C,  makes  this 
property  hold  across  different  values  of  0. 

Ac  kn owl edgemen  t 
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of'i  i'q]  -,r  i  •  ■r'c  ni;  T;  v  i  •-  i  o  t  ,  f  o ’■'c'-'i ,  linden  0ort>'ac1.  No, 

N  #  i  1  ■  >  t.i  t  i  on  ‘'umL-”*  NR  150-^95  • 
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Implementation  of  a  Computer  System  to  Support 
Diagnostic  Testing 


C.  David  Vale 

Assessment  Systems  Corporation 
St.  Paul,  Minnesota 

The  MicroCAT1"1  Testing  System,  developed  by  Assessment  Systems  Corporation, 
is  a  complete  system  of  computer  programs  to  support  computerized  adaptive  testing 
(CAT).  MicroCAT  runs  on  an  IBM  Personal  Computer  and  includes  facilities  for 
performing  all  of  the  functions  necessary  to  implement  adaptive  tests.  These  functions 
include  authoring  test  items,  calibrating  items  according  to  item  response  theory  (IRT) 
models,  authoring  tests  using  any  of  a  variety  of  CAT  strategies,  and  administering 
tests.  The  details  of  the  MicroCAT  system  have  been  described  elsewhere  (Assessment 
Systems  Corporation,  1984;  VaK  in  preparation,  a,  b)  and  will  not  be  detailed  here. 

MicroCAT  served  as  the  basis  for  the  testing  system  implemented  in  the  Basic 
Electricity  and  Electronics  (BE&E)  School  of  the  Naval  Training  Center  (NTC)  in  San 
Diego.  The  purpose  of  this  implementation  was  twofold:  to  provide  a  delivery  system 
for  the  diagnostic  testing  techniques  developed  by  the  University  of  Illinois,  and  to 
provide  an  environment  in  which  to  evaluate  the  MicroCAT  system,  which  was  not 
then  in  commercial  release. 

There  were  two  general  requirements  that  the  system  for  NTC  had  to  meet. 

First,  it  had  to  administer  the  current  NTC  tests  in  a  form  that  was  comparable  to 
their  original  mode  of  administration.  (This  was  to  allow  the  collection  of  data  on  the 
current  forms  in  an  operational  testing  environment.)  Second,  the  system  had  to  be 
capable  of  administering  tests  according  to  the  prescribed  diagnostic  strategies. 

In  the  standard  mode  of  administration  at  NTC,  a  student  is  assigned  a 
microfiche  card  which  contains  a  test.  He  or  she  then  reports  to  a  carrel  with  a 
microfiche  reader  and  responds  to  the  test  questions  on  an  optically  scannable  answer 
sheet.  After  completing  the  test,  the  student  puts  the  answer  sheet  into  an  optical 
scanner  that  is  connected  to  a  computer  terminal  which,  in  turn,  is  connected  to  a 
mainframe  computer  in  Memphis,  Tennessee.  A  computer-managed  instruction 
program  running  on  that  computer  scores  the  results,  updates  the  student’s  records  in 
the  database,  and  reports  the  score  to  the  student.  It  also  tells  the  student  which  test 
to  take  next. 

Our  goal  in  designing  the  MicroCAT  interface  to  this  mode  of  testing  was  to  be 
able  to  install  the  computerized  testing  system  in  such  a  manner  that  it  would  be 
perceived  by  examinees  as  comparable  to  the  microfiche  version  of  the  test.  Its 
operation  also  had  to  be  completely  transparent  to  MIISA,  the  computer-managed 
instruction  program,  so  that  the  results  of  the  computerized  tests  could  be  used 
operationally  in  the  computerized  instructional  management  system.  MIISA  is  a  very 
complex  program  that  manages  all  of  the  instructional  assignments  and  record  keeping 
at  NTC.  It  was  written  several  years  ago,  runs  on  a  mainframe  computer,  and  is  very 
resistant  to  change  of  any  kind.  Any  approach  to  implementation  that  required 
reprogramming  of  MIISA  was  not  viable. 


Although  the  MicroCAT  system  was  originally  designed  for  stand-alone  testing,  a 
general  networking  capability  was  added  to  aliow  a  proctor  to  assign  and  monitor  tests 
at  several  stations  from  a  single  proctor’s  station.  This  standard  MicroCAT 
networking  system  was  still  unable  to  communicate  with  MIISA,  but  the  proctoring 
station  provided  a  good  starting  point  for  the  connection.  To  achieve  the  transparency 
desired,  the  proctoring  program  was  adapted  so  that  it  could  interact  with  MIISA  as 
well  as  with  a  proctor.  It  was  changed  to  make  the  proctoring  terminal  emulate  one  of 
the  terminals  that  normally  communicated  with  th^  MIISA  mainframe.  To  accomplish 
this,  we  made  the  proctoring  station’s  serial  port  look  like  a  GE  Terminet  printing 
terminal  connected  to  an  optical  scanner.  As  far  as  MIISA  was  concerned,  it  was 
communicating  with  an  optical  scanner  connected  to  a  terminal.  When  an  examinee 
reported  to  the  testing  room,  the  proctor  assigned  him  or  her  to  a  testing  station, 
where  the  examinee  entered  his  or  her  social  security  number.  The  testing  station 
communicated  this  to  the  proctoring  station,  which  queried  MIISA  for  the  appropriate 
test  to  administer  and  assigned  it  to  the  examinee’s  station.  After  the  examinee 
completed  the  test,  the  proctoring  station  passed  the  responses  to  MIISA,  obtained  the 
scores,  and  printed  them  out  for  the  examinee  on  the  system  printer.  From  MIISA’s 
perspective,  the  computerized  system  was  capable  of  performing  all  of  the  functions 
usually  done  on  paper. 

Figure  1  diagrams  the  testing  system  as  implemented  at  NTC.  It  contained 
eighteen  terminals.  Fifteen  of  these  were  testing  stations,  two  were  network  servers 
(which  contained  all  of  the  tests  and  response  data),  and  one  was  a  proctoring  station. 
One  network  server  would  have  been  sufficient;  the  second  one  was  for  backup  in  case 
the  first  one  failed.  One  of  the  testing  stations  also  had  the  hardware  necessary  to 
convert  it  to  a  proctoring  station  if  the  first  one  failed. 

There  was  a  fear  among  the  Navy  Chiefs  in  charge  of  testing  that  it  might  be 
difficult  to  convince  the  examinees  that  the  computerized  mode  was  equivalent  to  the 
paper-and-pcncil  mode.  Their  concern  was  that  if  tests  were  administered  in  two 
forms,  superstitions  would  develop  among  the  students  regarding  which  mode  was 
easier.  Obviously,  this  problem  would  be  exacerbated  if  there  was  any  substance  to 
the  claim. 

Specifically,  three  factors  were  initially  assumed  to  be  related  to  the  examinees’ 
acceptance  of  the  computerized  mode  as  an  equivalent  mode  of  testing:  (1)  the  system 
had  to  respond  quickly  with  the  next  item  after  the  examinee  answered  the  previous 
one,  (2)  it  could  not  lose  an  examinee’s  work  if  any  portion  of  the  system  failed,  and 
(3)  it  had  to  support  standard  test-taking  strategics  such  as  skipping  difficult  items 
and  coming  back  to  them  later. 

MicroCAT  was  fast  enough  to  satisfy  the  first  requirement,  but  originally  it  did 
not  support  the  other  two  features.  Therefore,  a  recovery  feature  was  added  to  save 
the  examinee’s  responses  on  a  diskette  as  soon  as  they  were  made.  If  a  kiting  station 
failed,  the  examinee  could  simply  remove  the  diskette  in  his  or  her  station,  put  it  in 
another  station,  enter  his  or  her  social  security  number,  and  continue  the  test  with  no 
loss  of  data. 

The  ability  to  skip  items  and  later  return  to  them  was  also  added  to  the 
MicroCAT  system.  At  the  end  of  the  test,  the  examinee  is  asked  if  he  or  she  would 
like  to  review  all  of  the  items,  some  specific  ones,  or  just  those  he  or  she  skipped.  A 


8 


Figure  1.  Structure  of  the  NTC  Implementation 


review  is  granted  as  desired,  and  the  specified  items  are  re-presented  to  the  examinee 
with  his  or  her  previous  response  shown.  The  examinee  can  then  change  the  response 
or  leave  it  as  it  was. 


After  these  features  were  added  to  the  MicroCAT  system,  it  was  installed  for  a 
small-scale  evaluation.  In  general,  the  initial  system  ran  without  any  serious  errors. 
Several  small  problems  did  require  attention,  however.  First,  as  the  system  was 
originally  set  up,  it  was  possible  for  students  to  take  two  tests  simultaneously.  Since 
they  received  no  feedback  regarding  the  correctness  of  responses,  there  was  really  no 
advantage  to  be  gained  by  doing  this.  Nevertheless,  the  problem  was  corrected  by 
modifying  the  proctoring  program  to  keep  track  of  who  was  on  the  system  and  to 
allow  examinees  to  take  only  one  test  at  a  time.  The  second  potential  problem  was 
that  hitting  some  combinations  of  keys  could  abort  the  testing  system.  This  was  solved 
by  disabling  all  of  these  combinations  except  one  that  required  the  examinee  to 
simultaneously  push  three  keys.  It  is  highly  unlikely  that  anyone  would  accidentally 
hit  all  three  keys  at  once;  anyone  who  did  hit  them  would  probably  be  deliberately 
trying  to  abort  the  system.  It  was  not  possible  or  worthwhile  to  subvert  such  an 
attempt,  because  a  determined  examinee  could  always  reset  the  station  simply  by 
unplugging  it. 


The  system  was  implemented  operationally  in  August  of  1984.  The  test  items 
presented  were  similar  in  format  to  the  one  shown  in  Figure  2  (which  is  not  an  active 
test  item).  Most  examinees  had  no  trouble  with  the  new  system.  The  major  surprise 
on  the  first  day  of  testing  was  that,  on  the  average,  examinees  took  ten  minutes  to 
respond  to  an  item.  This  did  not  begin  to  tax  the  capacity  of  the  network,  which  was 
designed  to  handle  a  response  from  each  examinee  every  ten  seconds. 


Figure  2.  Sample  Electronics  Item 

Since  the  data  collected  by  computer  administration  were  to  be  analyzed  by  the 
University  of  Illinois,  a  data  transfer  scheme  was  needed.  The  MIISA  link  is  a  real¬ 
time  link  in  that  testing  waits  for  communication.  Transferring  the  data  to  the 
University  of  Illinois,  on  the  other  hand,  had  to  be  done  only  when  the  data  were 
needed  or  when  the  disks  on  the  NTC  network  were  full.  A  system  was  developed 
whereby  the  test  proctor  periodically  dumped  the  data  from  the  system  disks  to  two 
sets  of  diskettes,  one  for  the  University  of  Illinois  and  one  for  backup.  After 
dumping  the  data,  the  proctor’s  instructions  were  to  mail  one  set  to  the  University  of 
Illinois  and  to  keep  the  backup  set  until  receipt  was  confirmed.  The  data  on  the 
system  disk  were  erased  after  the  diskettes  were  made.  Except  for  the  difficulty  of 
getting  the  proctor  to  make  the  data  diskettes  on  a  regular  basis,  this  scheme  worked 
well. 
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Testing  has  net  been  interrupted  because  of  any  system  problems.  It  was 
interrupted  for  several  weeks,  however,  by  the  implementation  of  new  versions  of  the 
tests.  The  frequent  changes  in  tests,  which  had  not  been  anticipated  when  the  system 
was  installed,  required  frequent  communication  with  the  University  of  Illinois.  It  had 
been  intended  that  the  University  of  Illinois  would  do  the  test  development  and  then 
either  manually  install  the  tests  in  the  San  Diego  system  or  mail  complete  test  files 
with  installation  programs  to  be  run  by  the  proctor.  However,  as  the  test  changes 
became  more  frequent,  it  became  apparent  that  it  would  be  more  efficient  for  NTC 
personnel  to  make  the  changes  themselves  and  install  the  tests. 

Test  development  in  the  MicroCAT  system  is  a  three-stage  process.  First,  the 
items  are  authored  using  the  system’s  Graphics  Item  Banker.  Then  the  test  is  specified 
using  an  authoring  language.  Finally,  the  authoring  language  is  compiled,  a  process 
that  reformats  the  items  and  processes  the  instructions  in  a  manner  that  allows  items 
to  be  presented  rapidly.  Implementing  a  test  in  the  NTC  system  required  the  further 
step  of  copying  the  compiled  test  onto  the  appropriate  disk  volume. 

NTC  test  administration  personnel  mastered  the  process  with  relative  ease. 
However,  several  problems  arose  that  had  bothered  us  in  the  early  part  of  this  effort 
but  which  we  had  forgotten  until  the  NTC  personnel  began  to  use  the  system.  One 
such  problem  was  that  if  diskettes  were  swapped  while  the  item  banker  was  running,  a 
bank  would  be  destroyed.  We  originally  circumvented  this  problem  by  not  swapping 
diskettes,  but  this  solution  was  obviously  not  optimal.  We  therefore  wrote  a  utility 
program  that  could  recover  a  bank  destroyed  in  this  manner.  The  other  problem  that 
we  rc-discovcrcd  was  that,  using  the  Ethernet  network  from  3Com,  two  people  sharing 
a  disk  volume  can,  under  certain  circumstances,  destroy  each  other’s  work.  For 
example,  NTC  personnel  recently  destroyed  an  item  bank  by  writing  portions  of  a 
memo  over  it.  Fortunately,  the  new  program  was  able  to  restore  most  of  what  was  lost. 

In  general,  the  implementation  at  NTC  has  been  successful.  The  MicroCAT 
system  has  flawlessly  performed  most  of  the  tasks  required  of  it.  More  than  2,400 
items  have  been  banked  for  this  application,  approximately  50  different  tests  have 
been  implemented,  and  over  1,500  tests  have  been  administered.  Informal  evidence 
from  the  BE&E  School  suggests  that  the  system  is  fast  enough  for  administering  all  of 
the  tests  and  that  the  tests  arc  perceived  as  psychologically  parallel  to  the  microfiche 
form  (although  we  have  heard  that  the  computer  display  is  easier  to  read  than  the 
microfiche).  The  system  has  not  yet  been  used  for  diagnostic  testing,  but  custom 
interfaces  (portions  of  the  program  that  allow  programmers  to  augment  the  MicroCAT 
system)  have  been  provided  to  allow  diagnostic  testing  routines  to  be  implemented 
within  the  MicroCAT  system. 

The  MicroCAT  Testing  System  is  now  a  commercial  product  and  the  contract 
that  supported  its  development  and  implementation  at  NTC  is  near  an  end.  Its  use  at 
NTC  will  continue  and  the  University  of  Illinois  will  implement  the  diagnostic 
strategics  in  the  near  future.  It  has  been  a  successful  implementation  that  has 
demonstrated  the  relative  ease  with  which  the  transition  can  be  made  from  paper-and- 
pcncil  testing  methods  to  computerized  testing,  even  when  the  new  system  must  be 
integrated  into  a  complex  instructional  management  system  that  is  already  in  place. 
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APPLICATION  OF  RULE  SPACE  IN  A  NAVY  TESTING  ENVIRONMENT 


by  John  M.  Eddins 

University  of  Illinois  at  Urbana-Chatnpaign 
Introduction 

The  MicroCAT  testing  system  was  installed  in  the  BE&E  School  at 
the  San  Diego  Naval  Training  Center  in  order  to  extend  to  an  actual 
training  environment  the  theoretical  work  in  diagnostic  .:daptlve  testing 
described  by  Dr.  Tatsuoka,  and  reported  in  detail  in  Tatsuoka  (1985, 
in  press),  Tatsuoka  &  Tatsuoka  (1985)  and  Tatsuoka,  Tatsuoka  &  Baillie 
(1985).  I  will  present  first  a  summary  of  our  overall  plans  for  the 
project,  next  our  progress  to  date,  including  some  of  the  problems 
we  have  encountered,  and  finally  what  we  see  as  the  next  steps. 

Summary  of  Project  Plans 

The  principal  goal  of  the  project  is  to  develop  one  or  more  diagnostic 
adaptive  tests  for  the  BE&E  curriculum  using  the  rule  space  model,  and  to 
evaluate  the  effectiveness  of  these  tests  with  data  collected  from  pilot 
groups  of  Navy  trainees.  The  overall  plan  can  he  outlined  in  five  phases. 

1.  Collect  and  analyze  data  from  current  Navy  tests. 

2.  Link  rule  space  procedures  to  the  MicroCAT  system. 

3.  Add  experimental  items  to  current  tests. 

4.  Create  experimental  tests. 

5.  Give  experimental  tests  to  pilot  groups. 

6.  Report  the  results. 

Substantial  progress  has  been  made  on  the  first  two  phases,  and  we 
are  currently  working  on  the  third. 

Progress  to  Date 
1 .  Data  collection  and  analysis 

The  MicroCAT  system  was  installed  and  tested  during  the  summer  of 
1984,  but  ongoing  revisions  of  the  instructional  materials  and  tests 
delayed  full  implementation  until  spring,  1985. 

BE&E  trainees  at  the  San  Diego  base  began  testing  with  the  system 
in  late  March,  1985,  and  data  was  collected  for  Modules  1,  2,  4,  5, 

6  and  7  during  April  through  August,  1985.  As  it  turned  out,  one  more 
revision  of  the  tests  was  necessary.  The  revised  tests  are  now  on  line 
and  the  computer  testing  lab  is  back  in  operation. 

As  each  BE&E  trainee  takes  a  test,  the  MicroCAT  system  stores  a  record 
which  includes,  for  each  item,  the  item  number,  answer  key,  student's 
response,  and  time  to  respond.  These  data  are  added  to  a  file  which 
eventually  is  read  to  a  diskette  and  mailed  to  us  at  the  University  of 
Illinois.  Our  plan  is  to  gather  enough  data  to  analyze  the  items 


13 


statistically  and  estimate  item  parameters,  on  the  assumption  that  some 
existing  items  can  serve  as  a  starting  point  for  developing  items 
applicable  to  rule  space.  This  requires  a  minimum  of  200  to  300 
subjects,  with  a  significant  percentage  giving  wrong  answers.  Between 
April  and  August  we  collected  data  for  about  1500  subjects;  however, 
these  were  divided  among  five  versions  each  of  six  tests,  leaving  only 
around  fifty  for  each  test  version.  The  different  test  versions  for 
Mod  4  proved  to  be  identical  except  for  the  ordering  of  the  answer  foils, 
so  we  re-ordered  the  data  appropriately  and  merged  it  into  a  composite  set 
giving  us  235  subjects.  An  analysis  of  variance  across  the  different 
forms  confirmed  their  equivalence. 

A  summary  of  responses  for  the  items  in  this  dataset  is  shown  in 
Table  1.  A  high  percentage  of  the  responses  on  most  items  are  correct, 
and  several  items  show  essentially  no  discrimination.  These  data  are 
based  on  the  first  pass  through  the  test,  and  they  include  partial  tests 
taken  for  remediation,  hence  the  substantial  number  of  skipped  items. 


Table  1. 


BE&E 

Test.  Mod. 

,  4 - 

Summary 

of  Responses. 

All  test  forms 

equivalent  to 

form  1. 

Item# 

Key 

Rspl 

Rsp2 

Rsp3 

Rsp4 

Skip 

1 

1 

196 

8 

1 

1 

29 

2 

4 

9 

25 

0 

165 

36 

3 

3 

-» 

40 

125 

24 

44 

4 

1 

169 

7 

19 

6 

34 

5 

1 

207 

0 

1 

0 

27 

6 

3 

1 

12 

189 

0 

33 

7 

3 

0 

0 

202 

1 

32 

8 

1 

196 

2 

2 

0 

35 

9 

4 

1 

0 

2 

199 

33 

10 

3 

9 

2 

188 

0 

36 

1  1 

1 

204 

0 

9 

0 

22 

12 

2 

16 

178 

6 

1 

34 

1  3 

3 

19 

41 

109 

27 

39 

14 

3 

0 

0 

212 

1 

22 

15 

1 

152 

24 

11 

20 

28 

16 

4 

36 

14 

0 

154 

31 

17 

4 

10 

18 

45 

101 

61 

18 

4 

2 

13 

2 

168 

50 

19 

3 

0 

6 

139 

8 

82 

20 

3 

3 

14 

168 

2 

48 

21 

3 

8 

I 

177 

0 

49 

22 

2 

2 

197 

1 

0 

35 

23 

3 

2 

2 

187 

1 

43 

24 

2 

17 

120 

0 

12 

86 

25 

4 

19 

5 

1 

166 

44 

26 

2 

4 

197 

15 

* 

19 

27 

1 

151 

21 

42 

* 

21 

28 

3 

2 

26 

186 

* 

21 

29 

1 

172 

16 

20 

* 

27 

30 

2 

8 

202 

6 

* 

19 

31 

i 

205 

1 

6 

* 

23 

32 

i 

152 

56 

5 

* 

22 

33 

i 

70 

120 

23 

* 

22 

34 

3 

24 

14 

175 

* 

22 

35 

2 

17 

187 

12 

* 

19 

♦Only  three  choices  for  these  Items. 


Our  attempts  to  estimate  item  parameters  were  frustrated  at  first 
because  of  the  limited  number  of  both  items  and  subjects,  and  because 
of  the  large  number  of  skipped  items  resulting  from  partial  tests  taken 
as  remedials.  We  eliminated  the  partial  tests,  leaving  193  subjects, 
selected  22  of  the  most  promising  items  and  succeeded  in  estimating  the 
A  and  B  parameters  for  these  items.  We  used  a  computer  program  created 
by  Yamamoto,  Baillie  and  Tatsuoka  which  implements  an  EM  algorithm 
developed  by  Bock  and  Aitkin  (1981).  The  EM  method  has  the  advantage 
of  being  able  to  estimate  item  parameters  with  relatively  few  items. 
Results  are  shown  in  Table  2. 


Table  2. 


Bt&E TiMt  I  tea  Pjir.ine_ters  fer  Selected  Itens. 


I  tea’1 

A 

(dlscrtal nation) 

B 

(difficult 

10 

.92832 

-.49033 

11 

.80779 

-.76712 

12 

.48945 

-.84937 

13 

.80148 

1 .00827 

15 

.29399 

-.72174 

16 

.55109 

.09081 

18 

2.42278 

.65190 

19 

2.33766 

.91749 

20 

2.35894 

.63324 

21 

3.77397 

.64296 

23 

2.87624 

.39554 

25 

2.93106 

.71003 

26 

.72511 

-.66316 

27 

.86399 

.52368 

28 

1.07095 

.07809 

29 

.77854 

.10053 

30 

.38718 

-2.75522 

31 

.98740 

-.53114 

32 

.38085 

-.18672 

33 

1.16947 

1.68272 

34 

.88895 

.15271 

35 

.62432 

-.52315 

2.  Linking  of  Rule  Space  Procedures  with  MicroCAT 


To  set  up  a  test  to  be  administered  on  the  MicroCAT  system,  a  bank  of 
test  items  is  provided,  together  with  a  file  which  lists  the  items  to  be 
administered  and  specifies  the  logical  basis  on  which  each  successive  item 
is  to  be  chosen.  The  choice  can  range  from  a  simple  top  down  sequence  of 
all  items  on  the  list  to  a  decision  computed  by  a  program  external  to  the 
MicroCAT  system  and  passed  back  to  it.  In  our  case,  administering  a 
diagnostic  adaptive  test  using  rule  space  is  in  the  latter  category. 

At  the  beginning  of  a  test,  item  parameters  stored  in  the  item  bank 
are  read  into  an  array  in  memory  (Figure  1).  As  each  test  item  is 
administered  the  student's  response  is  stored,  and  program  control  is 
passed  to  the  rule  space  programs  along  with  student  response  information. 


The  student's  response  is  scored  as  right  or  wrong  (1  or  0),  the  weighted 
distances  to  all  remaining  items  are  calculated  (Tatsuoka  &  Tatsuoka, 
1985),  and  the  next  item  is  selected  for  optimal  speed  of  convergence. 

If  sufficient  convergence  has  been  achieved,  then  the  error  probabilities 
are  estimated  between  the  current  point  and  the  centroids  of  the  ellipses 
stored  in  the  bug  information  bank.  If  the  minimum  value  of  these 
probabilities  satisfies  a  specified  criterion,  next  item  is  set  for  stop; 
if  not,  next  item  is  not  changed.  Next  item  and  program  control  are  then 
returned  to  the  MicroCAT  test  driver.  The  computer  programs  for  the 
rule  space  procedures  were  developed  by  Robert  Baillie.  Details  of 
the  theoretical  basis  for  these  procedures  will  be  described  in  a 
future  report. 


Future  Plans 


The  principle  hurdle  remaining  before  we  can  actually  create  and 
install  experimental  test  items  is  a  detailed  analysis  and  verification 
of  specific  learning  tasks  to  be  addressed,  together  with  hypotheses 
as  to  causes  and  results  of  various  types  of  errors.  Designing  a 
diagnostic  test  for  use  with  the  rule  space  procedures  requires  that 
the  test  items  be  constructed  with  specific  characteristics,  and  these 
item  characteristics  must  be  analyzed  and  verified  with  extensive  data 
collection  and  computer  analysis.  Parallel  versions  of  the  items  also 
are  needed. 

At  present  we  are  examining  in  detail  the  concepts,  operations  and 
errors  represented  by  the  Navy's  BE&E  tests.  Three  items  selected  from 
the  Mod  4  test  will  illustrate  something  of  the  nature  and  complexity 
of  the  task.  In  the  first  item  (A),  a  simple  series  circuit  containing 
five  resistors  is  shown  and  the  student  is  asked  the  effect  on  voltage 
drops  at  an  open  resistor  —  increase  (1),  decrease  (2)  or  no  change  (3). 
The  second  item  (B)  shows  the  same  circuit  and  answer  choices,  but  asks 
the  effect  on  current  at  a  shorted  resistor.  The  third  item  (C)  shows 
a  series  circuit  with  values  given  for  applied  voltage  and  two  of  the 
resistors,  one  of  which  is  open.  The  student  is  asked  what  a  voltmeter 
across  the  open  will  indicate. 

For  item  A,  56%  of  the  students  chose  (2)  —  decrease  in  voltage 
drop  at  an  open;  for  item  B,  26%  chose  (2)  —  decrease  in  current  at 
a  short;  for  item  C,  18%  chose  (1)  —  zero  volts  read  across  an  open. 

Out  of  52  students  choosing  (2)  for  item  B,  34  also  chose  (2)  for  item  A; 
out  of  36  choosing  (1)  for  item  C,  27  also  chose  (2)  for  item  A;  but  out 
of  the  total  group  only  5  chose  all  three.  Apparently,  items  B(2)  and 
C(l)  represent  different  errors,  while  item  A(2)  subsumes  both  errors. 
This  is  also  confirmed  intuitively.  Response  (2)  to  item  B  probably 
results  from  confusing  shorts  with  opens  (the  result  of  a  short  often 
is  an  open).  Response  (1)  to  item  C  probably  represents  confusion 
about  the  nature  of  voltage  drop  (nothing  "happens"  across  a  break,  so 
voltage  drop  =  0).  These  are  different  misconceptions,  yet  response  (2) 
to  item  A  can  result  from  either  one. 

The  task  before  us  is  to  pursue  this  type  of  inquiry  until  we  have 
isolated  and  charted  a  sufficient  number  of  such  relationships  among 
items  to  enable  us  to  describe  several  common  errors  and  specify  test 
items  accordingly.  We  also  are  investigating  ways  to  utilize  the 
computer  to  speed  up  and  simplify  this  otherwise  tedious  and  complex  task 
What  we  are  looking  at  here  is  the  relationship  among  just  one  foil  for 
each  of  three  items.  To  systematically  consider  such  relationships 
among  all  foils  for  all  items  obviously  is  a  task  for  a  computer.  The 
critical  human  task  is  defining  the  relevant  concepts  and  errors. 

With  a  multiple  choice  test  it  is  important  that  each  foil  be 
designed  to  provide  specific  information,  so  that  the  choice  of  the 
next  item  will  be  based  on  more  than  simply  a  right-wrong  score.  Our 
first  experimental  test  items  will  likely  be  items  taken  from  the  current 
tests  with  some  changes  in  the  foils  or  the  circuit  values.  We  have  the 
facility  to  insert  these  items  into  the  existing  tests  without  disturbing 
the  regular  scoring.  We  could,  for  instance,  insert  five  extra  items 
at  the  end  of  the  Mod  4  tests  which  never  would  appear  in  the  Navy's 
computer  records  but  would  provide  us  with  response  data. 
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As  a  simple  illustration,  assume  that  an  item  such  as  A  is  found 
to  suggest  four  possible  errors.  We  present  this  item  first,  followed 
by  several  others  which  confirm  some  errors  and  eliminate  others.  This 
can  be  diagrammed  in  a  process  network  where  each  node  represents  a 
decision  branch.  By  tracking  these  decisions  the  computer  can  identify 
the  student's  errors,  provided  the  student  is  entirely  consistent  and 
is  following  a  predicted  route.  Unfortunately,  the  realities  of  human 
behavior  do  not  conform  to  such  a  deterministic  model.  By  using  pattern 
classification  techniques,  the  rule  space  approach  adds  an  element  of 
probability  to  the  model,  with  the  promise  of  much  better  rates  of  error 
detection. 
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INTRODUCTION 


Group-Administered  Aptitude  Tests 

Group-administered  aptitude  tests  can  be  described  as  conventional, 
paper-and-penci l  tests  and  Computerized  Adaptive  Tests  (CAT).  As  the  names 
indicate,  these  differ  in  administration  mode.  Less  obviously,  but  perhaps 
more  importantly,  they  also  differ  in  the  way  in  which  items  are  selected  for 
administration.  In  the  usual  paper-and-penci l  test,  all  examinees  are 
administered  the  same  items  in  the  same  sequence.  In  contrast,  a  CAT 
instrument  is  dynamically  tailored  to  the  measured  ability  Level  of  the 
individual  examinee,  during  the  course  of  the  test  administration.  This  means 
that,  at  least  potentially,  every  individual  receives  a  different  test. 

Typically,  in  a  CAT  administration,  the  first  item  selected  for 
administration  is  one  of  medium  difficulty,  since  we  know  nothing  about  the 
examinee's  ability  level.  If  the  examinee  responds  correctly,  the  ability 
estimate  is  raised  to  above  average,  and  a  more  difficult  itam  is  selected  for 
administration.  If  the  examinee  answers  this  second  item  incorrectly,  the 
ability  estimate  is  lowered  somewhat  through  the  updating  procedure.  As  a 
result,  an  easier  item  is  selected  as  the  third  question.  This  process  of 
selecting  an  item,  scoring  the  examinee's  response,  updating  the  ability 
estimate,  and  choosing  the  next  item  for  administration  continues  until  some 
stopping  rule  is  reached.  This  test  termination  criterion  may  be  either  the 
administration  of  a  prespecified  number  of  items  (fixed  length  testing),  or  the 
administration  of  items  until  the  ability  estimate  meets  a  prespecified  level 
of  precision  (variable  length  testing). 

Armed  Services  Vocational  Aptitude  Battery  (ASVAB) 

The  ASVAB  is  a  conventionally-administered,  paper-and-penci l  aptitude  test 
battery  used  by  all  the  U.S.  military  services  for  both  enlistment  eligibility 
screening  and  for  subsequent  classification  and  placement  into  entry-level 
training.  The  paper- a^d-pencil  version  of  the  battery  (P&P-ASVAB)  includes 
eight  power  tests  and  two  speeded  tests.  Administration  time  for  P&P-ASVAB 
takes  about  three  and  one-half  hours. 

The  P&P-ASVAL'  is  administered  under  two  large-scale  testing  programs.  The 
Production  Testing  Program  involves  the  administration  of  the  battery  in  the  68 


Military  Entrance  Processing  Stations  (MEPS)  and  in  about  900  Mobile  Examining 
Team  ()€T)  sites  located  across  the  country.  The  Student  Testing  Program  is 
administered  in  about  14,000  high  schools.  These  two  testing  programs  are  quite 
large,  each  involving  the  administration  of  the  battery  to  between  800,000  and 
1,000,000  persons  annually. 

COMPUTERIZED  ADAPTIVE  TESTING  VERSION  OF  ASVAB  (CAT-ASVAB) 


QbifiCliv.es 

The  Computerized  Adaptive  Testing  (CAT-ASVAB)  Program  has  two  broad 
objectives.  The  first  involves  the  development  of  a  system  that  automates  the 
administration,  test  scoring,  and  computation  of  the  Armed  Forces  Qualification 
Test  (AFQT)  score  and  various  other  composite  scores  derived  from  ASVAB  used  by 
the  individual  military  services.  Such  a  system  must  be  capable  of  use  in  both 

the  fixed-base  MEPS  and  in  the  portable  testing  environment  of  the  MET  sites, 

while  interfacing  with  the  existing  score  reporting  system.  The  second 

objective  of  the  CAT-ASVAB  program  is  to  evaluate  the  suitability  of  CAT-ASVAB 
as  replacement  for  the  P&P-ASVAB  in  the  Production  Testing  Program. 

Approach 

The  original  approach  to  the  development  of  the  CAT-ASVAB  System  was  a 
three-stage  competitive  "flyoff"  between  three  contractors  from  private 
industry.  During  the  first  stage,  the  three  contractors  developed  system 

design  concepts  and  supporting  analyses.  In  the  second  stage,  limited 
production  models  were  to  be  developed,  field-tested,  and  evaluated.  The  final 
stage  would  have  involved  one  of  the  three  original  contractors  going  into 
full-scale  production,  deployment,  and  implementation. 

The  approach  has  changed  as  a  result  of  three  factors.  First,  the 
timelines  submitted  by  the  contractors  for  Stage  2  were  considerably  longer 
than  we  had  planned.  Secondly,  some  remarkable  advances  have  been  made  in 
microcomputer  technology  during  the  past  few  years.  Finally,  LTGEN  E.  A. 
Chavarrie,  Deputy  Assistant  Secretary  for  Military,  Manpower  and  Personnel 
Policy,  in  his  keynote  address  at  the  last  MTA  convention  in  Munich,  provided 
strong  encouragement  for  reducing  the  long  timelines,  commensurate  with  meeting 
the  performance  objectives  of  the  program.  As  a  result  of  these  influences,  we 
have  adopted  a  markedly  different  approach.  With  a  focus  on  early 
implementation,  we  have  initiated  work  on  the  Accelerated  CAT-ASVAB  Program 
(ACAP) . 


ACCELERATED  CAT-ASVAB  PROGRAM  (ACAP) 


0b i ecti ve 

The  objective  of  ACAP  is  to  field-test  CAT-ASVAB  as  soon  as  possible.  In 
pursuit  of  this  objective,  we  will  procure  off-the-shelf, 
commerci al  ly-ava i lab le  microcomputer  equipment.  Software  will  be  designed  and 
developed  in-house  at  NPRDC. 


Evaluation  Criteria 


ACAP  will  be  designed  to  meet  the  nine  evaluation  criteria  originally 

established  for  the  full-scale  version  of  CAT-ASVAB:  (1)  performance,  (2) 

suitability,  (3)  reliability,  (4)  maintainability,  (5)  ease  of  use,  (6) 
security,  (7]  affordability,  (8)  flexibility/expandability,  and  (9) 

psychometric  acceptability.  Each  of  these  nine  major  criteria  includes 

numerous  subcriteria. 

The  performance  criterion  includes  both  general  and  specific  requirements. 
The  system  must  automate  the  current  P&P-ASVAB  functions  and  anticipated 

additional  functions  of  CAT-ASVAB.  System  response  time  cannot  exceed  a 
maximum  of  two  seconds,  and  this  response  time  must  be  independent  of  the 
number  of  examinees  taking  the  test  (system  load).  The  display  must  have  a 
resolution  of  400x300  pixels,  and  the  test  must  be  displayed  in  7x9  characters. 
The  system  must  support  an  interface  with  the  existing  MEPS  Reporting  System, 
and  the  CAT-ASVAB  Maintenance  and  Psychometric  (CAMP)  facility.  The  computer 
software  should  employ  a  "top-down",  structured  design,  use  a  high-level 
language,  and  be  adequately  documented. 

Suitability  requires  the  system  to  operate  in  a  normal  office  environment 
in  terms  of  temperature  and  humidity.  No  significant  modification  to  existing 
facilities  should  be  required  (e.g.,  electrical  power).  There  should  be  no 
necessity  for  specially  skilled  operators  and  no  significant  staffing  changes 
should  be  required.  Finally,  the  system  must  be  portable  to  support  testing  in 
the  MET  site  environment. 

Reliability  is  an  important  concern  for  the  system.  It  is  imperative  that 
the  system  be  available  and  operate  reliably  for  scheduled  testing  sessions. 
The  system  must  be  capable  of  restarting  from  the  point  of  failure  and 
recovering  from  failure  without  loss  of  data. 

Maintainability  is  a  major  consideration  for  any  large  scale  computer 
system.  No  skilled  technicians  are  to  be  required.  The  hardware/software  must 
incorporate  self-diagnostic  capabilities  which  can  be  readily  understood  by 
test  administrators.  An  adequate  integrated  logistics  support  system  must  be 
established  and  maintained  for  the  life  of  the  system. 

The  system  must  be  easy  to  use,  both  for  the  test  administrator  and  the 
examinee.  No  computer  experience  or  expertise  should  be  required.  Set-up 
procedures  should  be  clear,  unambiguous,  and  adequately  documented.  The 
display  legibility  and  resolution  should  support  both  text  and  graphics 
material.  Introduction  of  experimental  test  items  should  be  transparent  to 
both  the  examinee  and  the  test  administrator. 

Security  is  an  important  system  consideration.  The  item  sequence  should 
be  unpredictable.  Measures  must  be  taken  to  prevent  printout  or  inspection  of 
the  item  files.  Use  of  the  system  must  be  limited  to  authorized  personnel.  A 
multi-level  password  access  procedure  should  be  implemented.  System  access 
must  produce  an  audit  trail  which  can  be  inspected  by  system  managers. 
Finally,  the  system  should  fca  designed  to  minimize  equipment  theft,  by  reducing 
or  eliminating  pi«.ferable  components. 
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The  system  must  be  affordable,  i.e.,  the  life-cycle  cost  of  CAT-ASVA8  must 
be  comparable  to  that  of  P&P-ASVAB.  At  present,  an  economic  analysis  of  the 
system  is  being  conducted  under  contract. 

Flexibility/expandability  is  an  important  dimension  of  the  system.  In 
addition  to  supporting  the  delivery  of  a  CAT  version  of  ASVAB,  the  system  must 
allow  for  future,  add-on  peripheral  devices.  While  present  plans  call  for  the 
examinee  to  use  a  specially-designed  keypad  input  device,  the  system  must 
support  standard  keyboard  input.  A  programmable,  high-precision  clock  is 
essential,  as  future  testing  will  almost  certainly  involve  the  measurement  of 
response  latency.  Other  future  testing  possibilities  include  provision  to 
measure  a  person's  ability  to  identify  and  track  a  moving  target.  This  will 
require  a  system  capable  of  internal  and/or  external  expansion  and  provisions 
for  additional  interfaces  (e.g.,  a  joystick). 

Finally,  the  system  must  be  psychometrica l ly  acceptable.  CAT-ASVAB  must 
measure  the  same  aptitudes  measured  by  P&P-ASVAB.  The  CAT-ASVAB  and  P&P-ASVAB 
versions  of  the  battery  must  be  equated  to  insure  that  the  scores  are 
interchangeable.  The  CAT-ASVAB  system  must  meet  stringent  professional  test 
standards. 

Proaress/Plans 

We  have  already  achieved  several  goals.  We  have  developed  the  functional 
requirements  for  the  system.  An  equating  plan  has  been  developed,  and  is 
currently  under  review  by  policy  and  technical  representatives  from  each  of  the 
services  and  the  U.S.  Military  Entrance  Processing  Command  (USMEPCOM) .  We  have 
procured  a  small  number  of  development  systems  to  begin  software 
design/development.  A  procurement  action  to  obtain  equating  systems  has  been 
initiated  and  is  in  progress. 

Plans  include  completion  of  the  design  and  development  of  the  ACAP 
software  in  1986.  Data  collection  and  extensive  statistical  analyses  will  take 
about  a  year.  We  intend  to  begin  the  initial  operational  test  and  evaluation 
in  selected  MEPS  (and  their  satellite  MET  sites)  in  1987. 

CONCLUSION 

There  have  been  a  number  of  important  changes  in  the  development  of  the 
CAT-ASVAB  system  since  the  last  Military  Testing  Association  meeting  in 

November  1984.  Emphasizing  the  importance  of  demonstrating  the  extensive 

capabilities  of  an  adaptive  testing  system  for  ASVAB,  we  are  concentrating  on 

the  Accelerated  CAT-ASVAB  Program  (ACAP).  However,  ACAP  is  viewed  as  an 

interim  system,  not  as  a  replacement  for  the  full-scale  development  of 
CAT-ASVAB.  We  plan  to  use  "lessons  learned"  from  the  limited  deployment  of 
ACAP  to  strengthen  our  functional  requirements  specifications  for  the 
development  of  the  full-scale  CAT-ASVAB  system. 
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NAVPERSRANDCEN  is  currently  involved  in  a  major  system  development  effort  that  is 
concerned  with  the  research,  development,  and  eventual  implementation  of  a  Computerized 
Adaptive  Testing  version  of  the  Armed  Services  Vocational  Aptitude  Battcry(CAT-ASVAB). 
The  goal  of  this  effort  is  the  implementation  of  CAT-ASVAB  on  a  nationwide  distributed 
computer  network.  This  network  will  permit  the  United  States  Military  Entrance  Processing 
Command  (USMEPCOM)  to  adaptively  administer  the  ASVAB  to  civilian  applicants  for  mili¬ 
tary  service.  The  CAT-ASVAB  System  is  intended  to  replace  the  operational  paper-and- 
pcncil  battery  (P&P-ASVAB)  currently  used  for  selection  and  classification  of  enlisted  person¬ 
nel. 


P&P-ASVAB  testing  currently  occurs  at  68  Military  Entrance  Processing  Stations  (MEPS), 
two  substations  and  approximately  900  field  locations  identified  as  Mobile  Examining  Team 
(MET)  sites.  The  MEPS/MET  sites  arc  under  the  administrative  responsibility  of  USMEP¬ 
COM.  CAT-  ASVAB  System  components  will  be  used  in  both  of  the  MEPS  and  MET  testing 
sites.  The  system  must  provide  an  automated,  on-line  system  for  test  delivery  and  score  report¬ 
ing  using  adaptive,  conventional,  and  timed  psychometric  tests.  Item  response  theory  (Lord, 
1980)  constitutes  the  theoretical  foundation  for  CAT-ASVAB  adaptive  testing. 

CAT-ASVAB  System  Concept 

In  order  to  propose  a  system  conccpt/dcsign  for  a  Local  CAT-  ASVAB  Network  (LCN),  it 
is  necessary  to  have  available  a  set  of  specifications  and  standards  for  the  performance  of  the 
system  once  implemented.  Fortunately  a  government-specified  set  of  performance  standards 
exists,  and  it  has  been  documented  in  the  CAT-ASVAB  Stage  2  Full  Scale  Development  (FSD) 
Request  for  Proposal.  This  document  outlines  the  functional  requirements  for  the  development 
of  computer  hardware  specific  to  CAT-ASVAB  functions.  Before  the  FSD  is  implemented,  an 
early,  smallcr-scalc  development  known  as  the  Accelerated  CAT-ASVAB  Projcct(ACAP)  will 
be  completed  to  provide  pilot  data  in  support  of  the  CAT  Stage  2  FSD.  The  scope  of  the 
ACAP  effort  will  conform  as  much  as  possible  to  the  CAT-ASVAB  Stage  2  FSD.  However, 
the  CAT-  ASVAB  function*  to  be  addressed  under  ACAP  will  be  dependent  on  the  computer 
hardware  design  that  is  selected.  Unlike  the  FSD,  it  is  not  a  goal  of  ACAP  to  develop  com¬ 
puter  hardware  specific  to  CAT-  ASVAB  functions.  Rather  a  system  design  will  be  selected 
from  a  set  of  candidate  designs,  and  then  cotnmcrcially-availablc  computer  systems  will  be  sur¬ 
veyed  in  order  to  identify  the  most  appropriate  system  to  meet  the  functional  requirements  of 
CAT-ASVAB. 

CAT-ASVAd  Operational  Requirement 

The  primary  requirement  of  the  CAT-ASVAB  System  is  that  it  be  capable  of  administering 
a  battery  of  instruments,  equivalent  to  the  present  components  of  the  P&P-ASVAB.  The 
current  production  battery  (P&P-  ASVAB)  includes  10  tests;  eight  of  these  arc  cognitive  power 
tests;  two  arc  speeded  tests.  However,  once  it  is  implemented,  CAT-ASVAB  will  be  capable 
of  administering  other  cognitive  and  non-cognitivc  operational  and  experimental  instruments, 
as  determined  by  DoD  policy. 

At  present,  20  percent  of  the  production  P&P-ASVAB  testing  occurs  at  MEPS;  the  remain¬ 
ing  80  percent  occurs  at  MET  test  sites. 
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MEPS  P&P-ASVAB  testing  is  currently  but  one  part  of  the  processing  of  applicants  for 
enlistment,  and  it  occurs  at  fixed-site  locations  in  relatively  controlled  environments.  Experi¬ 
enced  and  well-trained  examiners  conduct  the  testing  sessions.  In  contrast  to  the  MEPS  testing 
environment,  the  MET  site  testing  is,  in  a  large  number  of  cases,  administered  by  an  Office  of 
Personnel  Management  (OPM)  employee  working  under  a  service  agreement  with  DoD.  For 
the  most  part,  MET  site  testing  is  conducted  in  borrowed  facilities  on  an  ad-hoc  basis. 
USMEPCOM  has  no  permanent  control  over  the  MET  facilities,  and  no  authority  to  modify 
them.  Thus,  the  CAT-ASVAB  System  will  be  required  to  be  used  by  non-USMEPCOM  exa¬ 
miners  who  must  set-up  and  take-  down,  possibly  even  transport,  the  CAT-ASVAB  testing 
equipment  to  and  from  non-USMEPCOM  facilities  as  required  to  support  examining  schedules. 

Functional  Requirements 

Based  on  the  functional  specifications  stated  in  the  CAT  Stage  2  RFP,  it  is  intended  that  a 
Local  CAT  Nctwork(LCN)  be  developed  that  would  permit  the  administration  of  CAT- 
ASVAB  to  civilian  applicants  at  any  of  the  MEPS  or  MET  sites  within  CONUS.  An  LCN 
would  consists  of  (up  to  24)  Examinee  Test  (ET)  Stations  linked  to  a  single  Test  Administra¬ 
tion  (TA)  Station  via  a  hard-wired  electronic  telecommunications  line.  In  addition,  a  Data 
Handling  Computer  (DHC)  would  reside  at  each  MEPS  to  support  the  telecommunications 
function  among  LCN  units  located  at  the  MEPS  and  the  MET  sites  for  that  MEPS,  and  with 
USMEPCOM  MEPS  computer  systems  to  be  used  for  archiving  CAT-ASVAB  data. 

A fEPS  Sites.  The  functional  capability  that  is  required  at  the  MEPS,  in  terms  of 
psychometric  testing,  is  identical  to  that  required  at  the  METS.  Specifically,  MEPS  equipment 
is  stationary,  but  identical  to  MET  site  equipment.  Identical  LCN  components  at  MEPS  and 
MET  sites  arc  necessary  to  accommodate  the  equating  of  CAT-ASVAB,  commonality  of 
equipment  for  software  and  hardware  maintenance  purposes,  and  to  permit  the  cost  effective 
sharing  of  equipment  across  both  types  of  sites.  In  contrast  to  most  MET  sites,  each  examiner 
at  a  MEPS  testing  site  must  be  able  to  monitor  up  to  24  ET  Stations  in  any  LCN. 

The  MEPS  site  implementation  of  CAT-ASVAB  must  also  include  a  DHC  unit.  The  main 
function  of  the  DIIC  is  to  collect  data  daily  from  each  LCN  within  the  associated  MEPS 
administrative  segment,  including  LCN’s  located  at  MET  sites.  Data  is  transmitted  to  the 
DHC  cither  over  a  hard-wired  connection  (  in  the  case  of  MEPS  LCN’s)  or  modems  in  the  case 
of  MET  LCN’s. 

MET  Sites.  At  the  MET  sites,  transportable  computer  systems  will  be  used  to  administer 
CAT-ASVAB.  The  hardware  configuration  is  to  be  based  on  the  concept  of  a  "generic"  LCN. 
This  generic  LCN  will  consist  of  six  ET  Stations  being  monitored  by  a  single  TA  Station, 
including  any  peripheral  equipment.  Note  that  many  more  (up  to  24)  ET  Stations  must  be 
monitored  by  a  single  TA  station  and  still  maintain  CAT-ASVAB  performance  requirements. 
In  addition,  it  is  important  that  the  selected  equipment  support  the  CAT-  ASVAB  Stage  2  por¬ 
tability  requirements;  i.c.,  number  of  packages  and  weight  requirements  for  a  generic  LCN  (no 
more  than  eight  components  weighing  a  total  of  no  more  than  120  lbs.,  each  component  weigh¬ 
ing  23  lbs).  Environmental  prcformancc  requirements  such  as  temperature,  humidity,  etc. 
must  also  be  met. 

The  computer  hardware  configuration  for  an  LCN  may  be  described  as  follows:  Each  ET 
Station  would  consist  of  a  response  device,  a  screen  display,  and  include  access  to  sufficient 
RAM  and/or  data  storage  to  permit  the  administration  of  any  CAT-ASVAB  test;  the  amount 
of  RAM  required  depends  on  the  particular  application  software  and  networking  design  being 
used  to  implement  the  functions.  Each  ET  Station  would  be  tied  into  a  TA  Station  by  net¬ 
working  cables;  the  TA  Station  being  essentially  an  ET  Station  with  a  mass  storage  device  and 
full-sized  keyboard  attached.  Finally,  a  singlc(very  portable)  printer  and  modem  for  the  TA 
Station  would  complete  the  complement  of  equipment  composing  the  LCN. 
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The  operational  requirement  for  an  LCN  is  to  administer  CAT-ASVAB  to  those  military 
applicants  scheduled  for  testing  at  the  MET  site.  Initially,  an  Office  of  Personnel  Management 
(OPM)  examiner  may  be  required  to  pick-  up  the  LCN  equipment  at  a  staging  area;  if  it  were 
not  secured  at  the  testing  site  itself.  The  equipment  would  be  transported  to  the  test  site,  car¬ 
ried  from  the  vehicle  to  the  test  site,  and  configured,  ready  for  testing  by  the  examiner.  Once 
all  examinees  have  completed  testing,  the  examiner  will  attempt  to  telecommunicate  examinee 
personal  and  test  item  response  data  to  the  DHC  unit  at  the  associated  MEPS.  This  will  be 
done  using  a  modem  and  dial-up  telephone  line,  if  available  at  the  test  site.  If  this  is  not  possi¬ 
ble,  the  data  will  be  transferred  once  the  equipment  is  returned  to  the  staging  area. 

Finally,  if  the  equipment  were  not  secured  at  the  testing  site,  the  examiner  would  package 
the  equipment  into  transportable  packages,  carry  these  packages  to  a  vehicle,  and  return  the 
equipment  to  the  staging  area. 

Design  Considerations 

It  can  be  concluded  from  the  survey  of  the  requirements  for  developing  a  CAT-ASVAB 
System,  that  design  efforts  must  be  focused  on  the  requirements  for  a  generic  LCN.  This 
mainly  includes  the  portability  and  functional  capability  of  an  LCN. 

Portability.  With  respect  to  the  portability  aspect  of  an  LCN,  it  should  be  clear  that  the 
response  of  the  ACAP  system  to  this  requirement  will  be  contingent  on  the  capabilities  of 
currently  available  commercial  computer  hardware.  The  ACAP  System  will  not  include  the 
development  and/or  the  building  of  computer  systems  to  meet  CAT-ASVAB  needs,  but  rather 
will  use  commercially  available  computer  hardware  to  support  the  accelerated  field-  test  effort. 

On-Line  Data  Storage.  The  on-line  data  storage  requirement  is  the  factor  which  will  most 
influence  the  selection  of  computer  hardware  to  support  the  ACAP  System  development. 
On-line  data  storage  requirements  will  also  significantly  impact  upon  the  design  of  the  software 
for  the  ET,  TA,  and  DHC  units.  The  TA  and  ET  Stations  must  have  access  to  the  on-line 
storage  of  two  Forms  of  the  CAT-ASVAB.  Based  on  an  estimate  of  100  items  per  test,  each 
Form  will  require  approximately  850  Kilobytes  (KB)  of  storage;  assuming  that  the  data  is 
stored  as  a  sequential  file  or  equivalent.  Actual  data  storage  requirements  would  be  30-50% 
higher  if  the  system  design  required  that  the  data  be  stored  as  a  random  access  file.  Only  one 
Form  must  be  available  to  an  ET  Station  during  a  test  session.  Additional  storage  for  use  by 
the  TA  and  ET  Station  will  have  to  be  allocated  for  two  experimental  item  pools,  of  170  KB 
each  (one  pool  for  each  form);  an  experimental  item  set  derived  from  the  experimental  item 
pool,  (10-125  KB);  application  software  (approximately  250  KB  per  unit);  two  survey  question¬ 
naire  item  banks,  80  KB  for  both  questionnaires;  and  examinee  personal  data  such  as  informa¬ 
tion  required  on  the  USMEPCOM  714-A  Form;  and  examinee  personal  and  test  item  response 
data,  15  KB  per  examinee. 

Test  Administration  Time.  Depending  on  the  network  that  is  proposed  to  support  ACAP,  the 
time  required  to  administer  a  CAT-ASVAB  test  could  also  affect  the  minimum  response  time 
required  to  have  test  data  for  the  next  item  available  at  ET  Stations. 

Candidate  Local  CAT-ASVAB  Network  Designs 

Using  the  preceding  design  considerations  as  a  guide,  the  current  commercial  market  in 
portable  (  or,  at  least,  transportable)  computer  systems  indicate  that  there  arc  three  basic 
designs  upon  which  to  develop  a  workable  ACAP  System.  For  the  purpose  and  scope  of  the 
ACAP  effort,  the  discussion  will  be  focused  upon  the  storage  (and  retrieval)  capability  of  the 
candidates  with  respect  to  the  850  KB  test  item  bank.  It  should  be  remembered  that  the 
important  consideration  is  that  the  examinees  have  available  (within  the  response  time  specified 
in  the  Stage  2  RFP),  the  correct  test  item  as  dictated  by  the  underlying  item  selection  strategy 
being  used  at  the  time. 
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The  basic  designs  to  be  discussed  arc  generic  designs  and,  as  such,  may  not  be  entirely 
represented  by  an  actual  example  on  the  commercial  market.  The  examples  provided  arc  those 
system  designs  that  come  very  close  to  representing  the  generic  design  being  discussed. 

Design  #  1  •  CAT-ASVAB  items  Stored  on  Removable  Media 

In  this  type  of  design,  each  ET  Station  would  consist  of  sufficient  internal  storage  on 
removable  media  (c.g.,  3  1/2  in.  micro-floppy  diskettes)  to  accommodate  the  storage  of  the 
entire  test  item  bank.  Internal  RAM  would  be  about  512  KB.  Necessarily,  the  test  item  bank 
files  would  have  to  be  encrypted  or  'scrambled*  on  the  removable  media.  The  ET  Station 
would  be  very  portable  and  weigh  from  11-17  lbs.,  and  include  a  fiat  panel  Liquid  Crystal 
Display  (LCD),  electroluminescent,  or  equivalent  low-weight  display.  The  compu.«.<-  system 
that  may  currently  best  illustrate  this  hardware  configuration  is  the  Data  Gcneral/Onc. 

The  main  advantage  to  this  type  of  configuration  is  that  it  is  basically  a  very  portable  sys¬ 
tem.  The  total  weight  for  a  generic  LCN  could  be  as  low  as  80  lbs.,  including  seven  very  tran¬ 
sportable  components  (assuming  that  the  LCN  Networking  requirement  was  suppressed). 

There  arc  many  disadvantages  to  this  design. 

1.  Security.  The  entire  test  item  bank  must  reside  on  the  removable  media,  necessarily  jeopar¬ 
dizing  the  security  of  the  test  item  bank  files. 

2.  Media  Updating.  Each  ET  Station  will  require  two  removable  diskettes  installed  in  the  disk 
drives  to  accommodate  CAT-ASVAB  testing.  If  this  design  were  going  to  be  installed  at  the 
ACAP  field-test  MEPS,  and  MET  sites,  approximately  400  ET  Stations  would  be  involved. 
This  would  require  800  micro-fioppy  diskettes  to  be  inventoried  and  secured;  a  very  large  media 
creation,  distribution,  and  security  problem  each  time  the  test  item  bank  is  updated. 

3.  Ease  of  Use.  Use  of  a  removable  storage  media  will  require  a  significant  amount  of  operator 
intervention  to  insert/remove  diskettes.  For  each  ET  Station,  two  movements  arc  required  to 
"boot'  the  computer  and  to  receive  testing  software.  Eight  diskette  movements  arc  required 
to  transfer  the  examinee’s  response  data  to  the  examiner’s  work  diskette;  two  movements  can 
be  avoided  for  subsequent  examinees.  A  MET  site  LCN  testing  10  individuals  with  six  ET  Sta¬ 
tions  available  would  require  at  least  100  movements. 

4.  Maintenance.  Since  the  micro-floppy  drives  are  in  constant  use  during  the  testing  process, 
system  maintenance  may  be  higher  than  some  other  configuration  in  terms  of  disk  drive  mainte¬ 
nance  and  diskette  replacement.  Each  test  item  being  displayed  will  require  at  least  one  disk 
access. 

Design  #  2  CAT-ASVAB  Items  Stored  on  a  Central  File  Server 

This  type  of  LCN  design  is  configured  around  a  central  file  server  (c.g.,  a  hard  disk)  which 
acts  as  the  repository  for  the  CAT-ASVAB  item  banks  and  supporting  data  files.  In  this  type 
of  design,  the  capabilities  of  the  network  supporting  the  movement  of  data  from  the  file  server 
to  each  examinee  station  is  of  paramount  importance.  A  minimal  amount  of  RAM  is  available 
at  the  ET  Station  (less  than  512  KB),  with  perhaps  one  internal  floppy  disk  drive  available.  A 
central  file  server  is  required  because  each  ET  Station  cannot  support  the  entire  requirement  of 
data  storage  without  adding  significantly  more  components  to  the  overall  network.  Typically, 
the  ET  Station  is  bulkier  and  may  not  support  the  latest  flat  screen  technology;  c.g.,  LCD  or 
electroluminescent  displays.  The  Macintosh  Computer  by  Apple  Corp.  is  an  example  of  this 
type  of  design.  In  general,  any  configuration  of  equipment  requiring  a  central  file  server  would 
be  an  example  of  this  design. 

Perhaps  the  main  advantage  to  this  type  of  network  is  that  a  very  sophisticated  networking 
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capability  must  be  installed  in  order  to  'make  it  work*.  Because  this  capability  is  available,  one 
could,  (theoretically)  also  install  a  TA  Station  monitoring  capability.  The  monitoring  capabil¬ 
ity  would  have  to  be  installed  in  such  a  fashion  so  as  not  to  compromise  the  response  time 
requirements  for  the  display  of  test  items  at  the  ET  Stations  in  the  LCN.  Admittedly,  the 
monitoring  function  could  be  potentially  very  useful  at  certain  large  MEPs,  in  which  many 
examinees  arc  being  examined  simultaneously.  Unfortunately,  this  is  also  the  situation  in  which 
the  ET  Station  response  time  requirement  would  be  most  compromised.  However,  this  system 
could  be  the  least  expensive  of  the  networks  being  investigated.  The  cost  of  the  file  server 
could  be  distributed  over  each  ET  Station  which  could  function  without  any  removable  media 
being  available.  Another  advantage  (as  opposed  to  Design  #1)  is  that  the  movement  of  test 
item  bank  data  and'or  examinee  response  data  to/from  the  ET  Station  is  automatic  and  docs 
not  require  examiner  intervention.  Once  the  network  is  set-up  and  working  properly,  the 
examiner’s  tasks  arc  minimized  with  respect  to  data  movement  requirements. 

The  main  disadvantage  to  this  design  is  concerned  with  the  reliability  of  the  file  server 
itself;  when  the  file  server  fails,  the  entire  network  is  inoperable.  To  meet  the  CAT-ASVAB 
Stage  2  reliability  requirements,  it  will  be  necessary  to  include  two  identical  such  in  the  LCN. 
This  means  that  the  system  is  heavy,  environmentally  intolerant,  and  requires  a  large  number 
of  components  to  be  transported.  In  addition,  for  MET  sites  in  which  the  equipment  has  to  be 
assembled,  and  disassembled,  during  each  testing  session,  a  heavy  requirement  will  be  placed  on 
the  examiner  to  serve  as  a  computer  operator.  This  could  conceivably  result  in  a 
reclassification  of  the  OPM  examiner  position. 

Another  disadvantage  is  that  the  system  response  time  and  monitoring  requirements  arc 
functionally  related  in  Dcsign#2.  It  is  very  difficult  to  imagine  a  network  operating  system 
that  can  simultaneously  accommodate  these  requirements  in  a  fully  loaded  LCN;  24  ET  Sta¬ 
tions  attached  to  a  single  TA  Station.  What  is  the  maximum  response  time  at  any  ET  Station 
when  all  examinees  arc  requesting  items  (simultaneously)  from  the  same  source  (i.c.,  file 
server)?  Note  that  sufficient  time  must  also  be  allowed  for  de-encryption  of  the  items  before 
display,  as  well  as  decompression  of  graphics  items,  as  necessary.  In  summary,  another  disad¬ 
vantage  of  Design  #  2  is  that  the  maximum  system  response  time  is  relatively  large  compared 
to  other  system  designs,  and  therefore  is  a  potentially  compromising  consideration  relative  to 
hardware  selection  for  purposes  of  ACAP. 

Design  #  3  •  Test  Item  Banks  Stored  in  Examinee  Test  Station  Random  Access  Memory 

The  TA  and  ET  Stations  in  this  design  would  consist  of  a  large  amount  of  internal  RAM, 
on  the  order  of  at  least  1.5  MB.  The  ET  Station  would  be  supported  by  one  micro-floppy  drive 
and  probably  include  the  latest  in  electroluminescent  or  LCD  display.  Therefore,  for  purposes 
of  recovery  due  to  network  failure,  the  ET  Station  would  be  very  responsive  as  it  is  capable  of 
operating  independently.  In  addition,  as  a  networking  capability  will  be  available,  the  TA  Sta¬ 
tion  could  perform  the  functions  of  an  "electronic"  file  server.  Total  RAM  available  on  the 
TA  Station  could  be  1.5  to  3.5  MB  (preferably  higher);  allowing  for  great  flexibility  in  the  total 
number  of  alternate  forms  available  during  any  one  test  session. 

Note  that  several  removable  media  arc  required  in  order  to  "boot"  the  systems.  However,  a 
total  of  no  more  thar  TWO  micro-floppy  diskettes  are  required  to  store  the  test  item  banks 
(per  Form)  and  supporting  data  files;  each  ET  Station  would  also  require  one  micro-floppy  to 
be  installed  as  a  "working”  diskette  for  failure  recovery  purposes.  Normally  (after  initial 
"boot-up"  at  the  beginning  of  a  test  session),  no  micro-floppy  diskette  movements  arc  required 
by  the  examiner;  i.c.,  the  network  would  accomplish  all  data  movements. 

The  main  advantage  of  this  design  is  that  it  offers  a  large  degree  o?  flexibility  with  respect 
to  design  options.  The  ET  Stations  arc  capable  of  operating  as  stand-alone  devices  and,  as 
such,  it  is  virtually  impossible  for  an  examinee’s  test  session  to  fail  to  be  completed;  each  ET 
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Station  backs-up  every  other  ET  Station.  For  this  reason,  and  as  it  minimizes  accesses  to  a 
mechanical  device.  Design  #  3  should  be  the  most  reliable  of  the  network  designs  discussed.  In 
addition,  this  design  offers  a  very  high  level  of  security.  Once  power  is  removed  from  the  com¬ 
puter,  the  volatile  random  access  memory  (RAM)  is  erased.  This  provides  dependable  security 
for  the  test  i'em  banks  Furthermore,  as  noted  above,  only  two  removable  micro-  floppy 
diskettes  arc  required  per  form;  regardless  of  the  number  of  stations  in  the  LCN. 

Another  important  consideration,  in  comparison  to  Design  #  2,  is  that  the  LCN  monitoring 
and  the  system  response  time  requirements  arc  not  functionally  related.  In  addition,  it  is  possi¬ 
ble  to  configure  a  collection  of  computer  hardware  for  Design  #  3  that  permits  the  entire  850 
KB  test  item  bank  (for  a  form)  to  reside  at  the  ET  Station.  Therefore,  the  response  time 
required  to  display  test  items  will  be  network  independent.  The  item  display  process  wilt  be 
accomplished  at  RAM  speed,  resulting  in  a  maximum  response  time  that  is  on  the  order  of  mil¬ 
liseconds,  as  opposed  to  seconds. 

Generally  speaking,  up  to  failure  recovery,  during  actual  test  administration,  the  advantages 
of  Design  #  3  include:  a)  no  removable  media  arc  requires,  b)  minimization  of  the  need  to  use 
mechanical  devices,  c)  a  high  level  of  security  for  test  item  banks,  d)  excellent  system  response 
time  characteristics,  and  c)  de-encryption  of  test  item  bank  need  occur  but  once;  when  this 
information  is  initially  transferred  to  the  'electronic*  file  server.  In  addition,  some  of  the 
better  features  of  Design  #  2  arc  also  characteristic  of  this  design:  a)  examiner  intervention  to 
move  data  in  an  LCN  is  not  required,  and  b)  TA  Station  monitoring  capability  can  be 
automated. 

The  primary  disadvantage  of  this  design  is  that  it  docs  tend  to  cost  more  than  some  alterna¬ 
tives.  However,  it  is  certainly  true  that  the  cost  of  computer  hardware  is  decreasing.  Another 
disadvantage  is  that  there  is  only  one  viable  candidate  system  on  the  market  that  would  come 
close  to  exemplifying  this  design;  that  being  the  Hewlett-Packard  Integral  Computer.  This  sys¬ 
tem  is  currently  capable  of  1.5  MB  of  internal  RAM  (to  include  a  networking  interface  card), 
with  a  single  3  1/2  in.,  710  KB  capacity,  micro-  floppy.  Finally,  in  the  ease  of  the  Integral,  the 
ET  Station  would  be  somewhat  heavier  than  some  alternatives,  weighing  approximately  23  lbs. 
(assuming  that  the  printer  is  removed). 

Recommendations. 

Of  the  three  Designs  discussed,  the  authors  recommend  Design  #  3.  They  believe  that  it 
will  give  the  Government  the  greatest  amount  of  flexibility  as  the  ACAP  system  is  field-tested 
and  the  LCN  configuration  needs  to  be  adjusted  to  accommodate  future  requirements.  In 
addition,  this  design  offers  a  system  with  the  greatest  amount  of  reliability  and  test  item  secu¬ 
rity.  No  mechanical  devices  arc  required  to  maintain  normal  CAT-ASVAB  testing  (up  to 
recording  examinee  data  on  the  micro-floppy  associated  with  an  ET  Station  for  back-up  pur¬ 
poses  in  the  event  that  station  fails  during  a  test  session).  Test  item  bank  data  will  be  stored  in 
volatile  RAM  instead  of  removable  media,  insuring  the  erasure  of  this  sensitive  information 
immediately  when  power  is  removed. 
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PURPOSE 

The  purpose  of  this  study  was  to  assess  the  construct  and  predictive  vali¬ 
dity  of  a  Computerized  Adaptive  Testing  (CAT)  version  of  the  Armed  Services 
Vocational  Aptitude  Battery  (ASVAB).  This  study  was  conducted  as  part  of 
an  effort  to  evaluate  CAT-ASVAB  as  a  replacement  for  the  paper-and-pencil 
ASVAB  (P&P-ASVAB). 


BACKGROUND 

Over  the  past  decade  numerous  empirical  studies  have  been  conducted  to 
evaluate  the  construct  and  predictive  validity  of  adaptive  aptitude  testing 
(Sympson  and  Moreno,  1985).  Overall,  the  results  of  these  studies  indicate  that 
adaptive  testing  is  as  valid  as  conventional,  paper-and-pencil  testing.  However, 
the  majority  of  these  studies  were  conducted  using  verbal  ability  or  arithmetic 
reasoning  tests.  Very  little  research  has  been  conducted  using  aptitude  tests 
measuring  other  types  of  ability. 

The  battery  of  interest  in  this  study,  the  ASVAB,  consists  of  tests  measur¬ 
ing  ten  types  of  ability.  The  ASVAB  is  used  by  all  military  services  for  selec¬ 
tion  and  classification  of  military  applicants.  This  study  examined  the  con¬ 
struct  and  predictive  validity  of  all  ASVAB  tests. 


APPROACH 

Examinees 

Examinees  were  military  recruits  scheduled  for  training  in  one  of  the  mili¬ 
tary  service  specialties  selected  for  inclusion  in  this  study.  Over  all  services, 
7,515  examinees  were  tested.  Sample  sizes  for  each  specialty  are  shown  in 
Table  1. 

Tests 

P&P-ASVAB.  The  P&P-ASVAB  is  a  group-administered,  conventional 
battery  in  which  all  examinees  answer  all  items  in  the  same  sequence.  The 
P&P-ASVAB  consists  of  eight  power  tests  and  two  speeded  tests:  General  Sci¬ 
ence  (25), 1  Arithmetic  Reasoning  (30),  Word  Knowledge  (35),  Paragraph 


1  Values  in  parentheses  are  test  lengths. 


Comprehension  (IS),  Auto  and  Shop  Information  (25),  Mathematics  Knowledge 
(25),  Mechanical  Comprehension  (25),  Electronics  Information  (20),  Numerical 
Operations  (50),  and  Coding  Speed  (84).  There  are  six  parallel  forms  of  the 
P&P- ASVAB.  This  study  used  forms  8A  and  9A.  Number  correct  scores 
served  as  estimates  of  ability. 

CAT-ASVA8.  The  CAT-ASVAB  used  in  this  study  is  an  experimental  ver¬ 
sion  designed  to  measure  the  same  abilities  as  those  measured  by  the  P&P- 
ASVAB.  There  are  nine  power  tests  and  two  speeded  tests:  General  Science 
(I97),2  Arithmetic  Reasoning  (166),  Word  Knowledge  (194),  Paragraph 
Comprehension  (95),  Auto  Information  (179),  Shop  Information  (189), 
Mathematics  Knowledge  (135),  Mechanical  Comprehension  (70),  Electronics 
Information  (168),  Numerical  Operations  (50),  and  Coding  Speed  (84).  The 
nine  power  tests  were  administered  adaptively  using  maximum  information 
item  selection  and  bayesian  scoring.  All  power  tests  were  terminated  at  a  fixed 
length  of  15  items,  expect  for  Paragraph  Comprehension,  which  was  terminated 
after  10  items.  The  two  speeded  tests  were  administered  in  a  conventional 
manner,  with  the  test  terminating  after  a  fixed  time.  For  the  speeded  tests, 
number  correct  scores  were  used  to  estimate  ability. 

Criterion  Variables 

Since  each  service  has  numerous  training  schools,  prediction  of  perfor¬ 
mance  for  only  a  selected  number  could  be  assessed  in  this  study.  Schools 
were  selected  so  that  a  wide  variety  of  specialties  would  be  represented.  In 
addition,  since  military  services  use  composites  of  test  scores  for  selection  and 
classification,  schools  were  selected  so  that  school  composite  scores  would  span 
all  ASVAB  tests.  Table  1  lists  the  selected  specialties  and  the  criteria  used  for 
each  specialty.  For  the  majority  of  specialties,  final  school  grade  (FSG)  or 
time  to  completion  (TC)  was  used.  However,  for  some  specialties  these  meas¬ 
ures  were  not  available.  In  these  cases,  analyses  were  performed  to  determine 
which  measure  should  be  used. 

Procedure 

Examinees  were  tested  approximately  two  weeks  after  arrival  at  a  recruit 
training  center,  prior  to  entrance  into  training  schools.  Examinees  were  group- 
administered  the  experimental  CAT-ASVAB  and  those  P&P- ASVAB  tests  that 
were  used  in  computing  a  recruit’s  school  selection  composite  score.  The  tests 
were  counter-balanced  so  that  half  the  examinees  took  CAT-ASVAB  first  and 
half  took  P&P-ASVAB  first.  The  CAT-ASVAB  was  administered  using  Sanyo 
monitors  and  Apple  III  computers  networked  with  a  Corvus  hard-disk  drive. 

Pre-enlistment  ASVAB  scores  were  collected  on  all  examinees  from  DD 
forms  1966.  School  performance  data  were  collected  after  examinees  had  com¬ 
pleted  training. 

2  Values  in  parentheses  are  item  pool  sizes. 


Data  Analyses 

Predictive  Validity.  For  each  of  the  selected  specialties,  composite  scores 
were  computed  from  CAT-ASVAB  standardized  scores  and  from  P&P-ASVAB 
standardized  scores.  Validity  coefficients  were  obtained  for  each  test  version 
by  correlating  school  composite  scores  with  school  performance  data.  In  order 
to  test  for  significant  difference  between  test  versions,  t  values  were  computed. 
Since  examinees  in  this  study  were  a  selected  sample  from  the  military  appli¬ 
cant  population,  validity  coefficients  were  corrected  for  range  restriction  using 
a  multivariate  approach  (Lawley,  1943).  No  significance  testing  was  performed 
using  corrected  validity  coefficients. 

Construct  Validity.  For  each  service,  the  intercorrelation  matrix  of  CAT- 
ASVAB  and  P&P-ASVAB  test  scores  was  factor  analyzed  using  the  principal 
axes  method,  followed  by  a  varimax  rotation  to  simplify  the  factor  structure. 

RESULTS 

Predictive  Validity 

Table  1  shows  the  validity  coefficients  obtained  using  CAT-ASVAB  and 
P&P-ASVAB.  Significance  tests  revealed  no  differences  between  the  validity 
coefficients  for  the  two  test  versions,  even  though  CAT-ASVAB  tests  are 
much  shorter  than  P&P-ASVAB  tests. 

Construct  Validity 

Table  2  shows  the  results  of  the  factor  analysis  using  data  from  the  Air 
Force  sample.  Four  factors  were  extracted,  based  on  an  eigenvalue  of  1.0  or 
greater.  These  factors  have  been  labeled  as  technical,  verbal,  mathematical, 
and  speeded  factors.  As  shown,  the  CAT-ASVAB  tests  had  similar  loadings  to 
those  of  the  corresponding  P&P-ASVAB  tests.  Findings  were  similar  for  the 
other  three  services. 


CONCLUSIONS  ; 

These  results  suggest  that  CAT-ASVAB  is  a  viable  alternate  to  P&P-  J 

ASVAB.  In  this  study,  CAT-ASVAB  tests  seem  to  be  measuring  the  same  | 

abilities  as  the  P&P-ASVAB  tests  and  predict  school  performance  as  well  as  \ 

P&P-ASVAB  tests,  even  though  CAT-ASVAB  test  lengths  are  much  shorter.  j 

However,  before  replacing  the  P&P-ASVAB  with  CAT-ASVAB,  the  two  ver-  | 

sions  should  be  compared  in  terms  of  differential  prediction  by  test  version  and  |j 

subgroup  membership.  Such  analyses  are  currently  being  performed  at  g 

NPRDC.  p 

* 
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Tabic  1 


Validity  Coefficients  for  CAT-ASVAB  and  P&P-ASVAB 


Specialty 

Criterion 

Sample 

Size 

Validity  Coefficients 

CAT-ASVAB  P&P-ASVAB 

NAVY 

Radioman 

TC 

186 

-.41  (-.69) 

-.40 

(-58) 

Mess  Management 

FSG 

170 

.45  (  .74) 

.40 

(.71) 

Hospital  Corpsman 

FSG 

192 

•56  (.77) 

50 

(•80) 

Electronics  Tech. 

TC 

143 

-.41  (-.80) 

-.46 

(-33) 

Hull  Maint.  Tech. 

FSG 

170 

35  (  .75) 

35 

(.75) 

Sonar  Tech. 

FSG 

205 

.46  (  .79) 

.46 

(30) 

AIR  FORCE 

Avionics 

FSG 

147 

36  (.80) 

56 

(30) 

Administration 

FSG 

208 

22  (  57) 

.15 

(53) 

Aircraft  Maint. 

FSG 

245 

-54  (31) 

.49 

(.79) 

Medical 

FSG 

95 

j64  (  37) 

52 

(36) 

Security  Police 

FSG 

456 

.49  (  .78) 

.45 

(.77) 

MARINES 

Avionics 

FSG 

228 

.49  (  31) 

.48 

(31) 

TC 

228 

-58  (-34) 

-54 

(-34) 

Administration 

Lejeune 

FSG 

72 

.14  (32) 

-.05 

(.13) 

Pcndclton 

FSG 

39 

22  (.49) 

29 

(56) 

Aircraft  Mech. 

School  1 

FSG 

181 

34  (  54) 

30 

(53) 

School  2 

FSG 

69 

50  (  .70) 

.47 

(.70) 

Motor  Transport 

FSG 

151 

29  (  .47) 

30 

(.48) 

Combat  Engineer 

Sum(All) 

123 

59  (32) 

56 

(•80) 

Field  Radio  Opr. 

Sum(l-4) 

128 

33  (  .43) 

21 

(37) 

Sum(5-8) 

128 

36  (  .46) 

39 

(.47) 

ARMY 

Infantry 

Sum(AIl) 

329 

-24  (-34) 

-28 

(-36) 

Mechanic 

Fort  Dix 

Average 

198 

57  (  .74) 

59 

(.76) 

Fort  Jackson 

PC 

186 

35  (  52) 

38 

(54) 

Motor  Transport 

Sum(AU) 

277 

-.47  (-.63) 

-.44 

(-53) 

Administration 

Wtd  Sum 

145 

-35  (-54) 

-.44 

(-.72) 

Telecom.  Opr. 

Sum(All) 

169 

.15  (28) 

22 

(•32) 

Medical 

FSG 

225 

53  (  35) 

59 

(33) 

Note.  TC  is  the  time  for  course  completion;  FSG  is  a  final  school  grade;  Sum(All)  is 
a  sum  of  training  module  scores;  Sum(l-4)  is  a  sum  of  scores  on  training  modules  one 
through  four;  Sum(5-8)  is  a  sum  of  scores  on  training  modules  five  through  eight; 
Average  is  an  average  of  the  scores  on  all  training  school  modules;  PC  is  a  percent 
;  correct  score  on  the  end-of-course  test;  and  Wtd  Sum  is  a  sum  of  module  scores 

minus  a  weighted  typing  score. 

Numbers  in  parentheses  are  validity  coefficients  corrected  for  restriction  in  range. 
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Table  2 


Results  of  a  Factor  Analysis  of  CAT-ASVAB 
and  P&P  ASVAB  Scores  from  an  Air  Force  Sample 


Varimax  Rotated  Factor  Matrix 

Factor  1 

Factor  2 

Factor  3 

Factor  4 

(Tech) 

(Verbal) 

(Math) 

(Speeded) 

P&P-ASVAB 


AR 

38 

.15 

.66* 

31 

WK 

.17 

.82* 

.08 

.05 

PC 

.16 

30* 

.13 

.16 

NO 

-.07 

.00 

34 

.70* 

GS 

.40* 

.63* 

32 

-.01 

CS 

.00 

.01 

.06 

.71* 

AS 

.82* 

.14 

-.01 

.00 

MK 

.17 

35 

.80* 

33 

MC 

.65* 

21 

36* 

-.01 

El 

.61* 

32* 

.19 

-.03 

CAT-ASVAB 

AR 

31* 

27 

.71* 

32 

WK 

.15 

.85* 

.16 

.03 

PC 

.17 

.68* 

33 

.13 

NO 

-.03 

.13 

36 

.65* 

GS 

33* 

.73* 

38 

.00 

CS 

-.03 

.10 

.08 

.71* 

AS 

.90* 

.15 

.09 

-.02 

MK 

.08 

39 

.74* 

33 

MC 

.66* 

32 

30* 

-.03 

El 

.64* 

.42* 

36 

-.08 

•factor  loading  >  30 
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Background 

A  joint -service  project  is  underway  v.  develop  a  computerized  adaptive 
testing  version  of  the  Armed  Services  vocational  Aptitude  Battery  (CAT- 
ASVAB).  CAT-ASVAB  is  intended  to  replace  the  paper-and-pencil  ASVAB 
(P&P-ASVAB),  used  by  the  four  military  services  to  select  and  classify  appli¬ 
cants.  The  present  P&P-ASVAB  battery  consists  of  ten  aptitude  subtests.  These 
subtests,  and  the  number  of  items  included  in  each  for  a  given  form  of  P&P- 
ASVAB,  are  listed  in  Table  1  below. 


Table  1.  P&P-ASVAB  Subtests  and  the  Number  of  Items  Included  per  Form 


Subtest 


Number  of  Items 


General  Science  (GS)  25 

Arithmetic  Reasoning  (AR)  30 

Word  Knowledge  (WK)  35 

Paragraph  Comprehension  (PC)  15 

Numerical  Operations  (NO)  50 

Coding  Speed  (CS)  84 

Auto  and  Shop  Information  (AS)  25 

Mathematics  Knowledge  (MK)  25 

Mechanical  Comprehension  (MC)  25 

Electronics  Information  (El)  20 


The  Numerical  Operations  and  Coding  Speed  subtests  are  speeded  measures; 
all  other  subtests  are  power  measures.  All  examinees  taking  a  given  P&P- 
ASVAB  form  are  administered  the  same  set  of  test  items  in  all  subtest  areas. 

While  CAT-ASVAB  is  intended  to  measure  the  same  subtest  areas  as 
P&P-ASVAB,  the  power  subtests  of  CAT-ASVAB  will  be  administered  adap¬ 
tively.  Within  each  power  subtest,  different  items  will  be  selected  for  computer 
administration  to  examinees,  depending  on  their  performance  on  previously 
administered  items.  The  testing  process  is  thus  individualized  for  examinees. 


The  Problem 

The  theoretical  framework  supporting  the  adaptive  testing  process  to  be 
implemented  in  CAT-ASVAB  is  item  response  theory  (Lord,  1980).  The  item 
response  theory  methods  to  be  used  assume  that  each  subtest  is  unidimensional. 
The  outcome  of  this  assumption  is  that  each  item  included  in  a  given  subtest 
should  measure  the  same  unitary  construct,  in  addition  to  having  specific  and 
error  variance  components  associated  with  it.  The  Committee  for  an  Evalua¬ 
tion  Plan  for  the  Computerized  Adaptive  Vocational  Aptitude  Battery  (Green, 
Bock,  Humphreys,  Linn,  &  Reckase,  1982)  states  that  unidimensionality  is 
always  advisable  for  tests  of  ability,  but  is  more  important  for  adaptive  tests. 
Though  some  may  be  of  the  opinion  that  the  IRT  model  to  be  applied  in 
CAT-ASVAB  is  strong  enough  to  counteract  potential  problems  with  respect 
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to  multidimensionality  in  CAT-ASVAB  subtests,  further  study  of  this  problem 
is  necessary.  Unidimensionality  may  be  considered  an  important  problem  for 
an  adaptive  test  because  different  items  are  administered  to  examinees.  For  a 
traditional  test  such  as  the  P&P-ASVAB,  all  examinees  are  presented  the  same 
test  items,  and  are  given  an  equal  opportunity  to  respond  to  them  irrespective 
of  their  dimensionality.  The  individualized  nature  of  adaptive  tests  may  raise 
questions  related  to  test  fairness.  The  present  paper  describes  the  approach  to 
be  taken,  progress,  and  plans  for  dimensionality  analyses  of  CAT-ASVAB 
items. 

Approach 

A  recent  advance  m  the  development  of  factor-analytic  approaches  to 
exploring  test  dimensional  ty  is  "full-information  item  factor  analysis."  Bock, 
Gibbons,  and  Muracki  i'1935)  contend  that  of  the  various  methods  that  have 
been  proposed  for  inves'igating  dimensionality  of  item  sets,  item  factor 
analysis  is  the  most  seivtive  and  informative.  This  method  of  item  factor 
analysis  is  based  on  iierr.  response  theory;  it  uses  all  data  as  distinct  item 
response  vectors.  1  hur>t(. ne’s  multiple  factor  model  is  used.  The  procedure  is 
implemented  by  margin.ii  maximum  likelihood  estimation  and  the  EM  algo¬ 
rithm.  Statistical  significance  of  the  addition  of  successive  factors  to  the  model 
is  tested  by  a  likelihood  ratio  criterion.  Provisions  for  the  effects  of  guessing 
on  multiple  choice  items,  and  for  omitted  and  not  reached  items,  are  included. 

One  of  the  applications  of  this  methodology  to  real  data,  presented  by  the 
authors  as  evidence  for  the  accuracy  and  practical  utility  of  the  method,  is  an 
analysis  of  the  power  subtests  of  P&P-ASVAB.  The  analysis  was  conducted  in 
a  ten-percent  random  sample  of  data  from  the  Profile  of  American  Youth 
Study.  The  number  of  cases  used  in  the  analysis  was  1,178,  drawn  from  a  total 
sample  of  11,817  subjects.  The  details  of  the  item  factor  analyses  are  presented 
in  the  Bock  et  al.  (1985)  report.  For  the  purposes  of  this  presentation,  the 
results  obtained  for  P&P-ASVAB  are  relevant  to  the  selection  of  and  applica¬ 
tion  of  an  approach  for  studying  test  dimensionality  in  the  intended  replace¬ 
ment  battery,  CAT-ASVAB.  Thus  they  will  be  described  briefly. 

For  the  P&P-ASVAB  General  Science  test,  Bock  et  al.  (1985)  found  two 
significant  factors;  one  factor  was  interpreted  as  a  physical  science  factor  and 
the  other  factor  was  interpreted  as  a  biological  (or  health  science)  factor.  Two 
factors  w«re  also  found  for  the  Arithmetic  Reasoning  subtest.  While  the 
second  factor  found  was  a  minor  one,  the  authors  have  interpreted  it  as  a  busi¬ 
ness  arithmetic  factor.  For  the  Word  Knowledge  subtest,  clear  evidence  for  a 
second  factor  was  found,  though  the  factor  has  no  apparent  relationship  to 
item  content.  For  the  Auto  and  Shop  Information  subtest,  the  authors  found 
clear  evidence  for  two  factors  separating  the  two  types  of  items.  For  the 
Mathematics  Knowledge  subtest,  two  significant  factors  were  also  found;  one 
factor  involved  items  requiring  knowledge  of  formal  algebra  and  the  second 
factor  involved  numerical  calculation  and  mathematical  reasoning.  Only  one 
factor  was  found  for  the  Paragraph  Comprehension,  Mechanical  Comprehen¬ 
sion,  and  Electronics  Information  subtests. 

Bock  et  al.  (1985)  conclude  that  the  applications  of  the  procedure  reported 
in  their  paper  show  that,  for  moderately  large  samples,  minor  factors  can  be 
detected.  The  procedure  is  recommended  as  an  exploratory  technique  in 
searching  for  item  features  that  are  responsible  for  individual  differences  in 
cognitive  test  performance. 
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Given  the  reported  adequacy  and  practical  utility  of  the  full-information 
factor  analysis  approach  to  the  detection  of  multidimensionality  in  item  sets, 
and  the  indications  of  multiple  factors  found  by  Bock  et  al.  (1985)  in  the  sub¬ 
tests  of  the  P&P-ASVAB  battery  which  CAT-ASVAB  is  intended  to  replace, 
NPRDC  plans  to  make  use  of  this  procedure  in  study  ng  the  dimensionality  of 
the  CAT-ASVAB  items. 

Progress  and  Plans 

One  of  the  requirements  of  the  adaptive  testing  process  is  the  develop¬ 
ment  and  calibration  of  a  large  bank  of  test  questions,  covering  a  wide  range 
of  difficulty  for  the  intended  test-taking  population,  for  use  in  the  item  selec¬ 
tion  process.  In  a  study  carried  out  by  the  Air  Force  Human  Resources 
Laboratory,  and  conducted  under  contract  by  Assessment  Systems  Corpora¬ 
tion,  a  bank  of  2,118  test  items  intended  for  operational  use  in  CAT-ASVAB 
was  developed  and  calibrated  in  1983.  This  bank  of  test  questions  covers  all  of 
the  P&P-ASVAB  power  subtest  areas,  with  more  than  200  items  developed  for 
each  subtext.  On  the  basis  of  statistical  and  judgemental  criteria,  items  will  be 
selected  from  the  total  number  available  for  inclusion  in  the  final  CAT- 
ASVAB  battery.  One  of  the  criteria  for  accepting  an  item  for  inclusion  in  the 
final  battery  is  measurement  of  a  one-dimensional  universe  represented  by  a 
pool  of  items.  This  pool  may  represent  a  subtest,  or,  alternatively,  a  subset  of 
items  within  a  subtest.  From  unidimensional  pools  of  items,  individual  test 
items  will,  of  course,  be  selected  for  administration  to  examinees  in  the  adap¬ 
tive  testing  process. 

Preliminary  full-information  item  factor  analyses  of  one  of  the  more 
suspect  subtest  areas  of  the  CAT-ASVAB  item  bank,  given  the  results  of  Bock 
et  al.  (1985),  have  been  conducted.  Analyses  of  the  General  Science  items  indi¬ 
cate  that  indeed  the  item  factor  analysis  procedure  is  sensitive  to  the  presence 
of  minor  factors  in  the  data.  The  CAT-ASVAB  General  Science  items  were 
developed  to  measure  three  main  subject  areas:  (1)  Life  Science;  (2)  Physical 
Science;  and  (3)  Earth  Science  (Prestwood,  Vale,  Massey,  &  Welsh,  1984). 
Items  were  randomly  assigned  to  four  test  booklets  for  calibration  purposes 
and  approximately  2,500  examinees  were  tested  with  each  booklet.  Inspection 
of  the  individual  items,  the  content  categories  which  they  were  written  to 
represent,  and  the  preliminary  item  factor  analysis  results  suggests  some  clus¬ 
tering  of  items  in  terms  of  subject  matter.  Where  a  preponderance  of  items  of 
one  type  have  been  assigned  to  a  booklet,  the  procedure  appears  to  be  sensitive 
to  the  detection  of  minor  factors. 

It  is  NPRDC’s  intention  to  conduct  full-information  item  factor  analyses 
for  each  power  subtest  to  be  included  in  the  CAT-ASVAB  battery.  This  will 
be  done  in  a  joint  calibration  analysis  of  both  the  CAT-ASVAB  items  and 
P&P-ASVAB  items  for  each  subtest.  While  CAT-ASVAB  is  intended  to 
replace  P&P-ASVAB,  both  batteries  will  concurrently  be  administered  in  an 
operational  setting.  CAT-ASVAB  subtest  scores  must  be  scaled  to  those  of 
P&P-ASVAB  and,  as  is  the  case  for  the  present  P&P-ASVAB,  a  single  CAT- 
ASVAB  score  will  be  generated  for  each  subtest.  The  joint  item  factor  analysis 
of  CAT-ASVAB  items  and  P&P-ASVAB  items  for  each  subtest  is  expected  to 
result  in  as  many  or  more  factors  than  those  determined  for  the  corresponding 
P&P-ASVAB  subtes’  done.  Where  more  than  one  dimension  is  present,  the 
joint  calibration  will  ^ilow  for  transformation  of  the  resulting  item  parameters 
to  congruence  with  those  parameter  estimates  obtained  from  the  analysis  of 
P&P-ASVAB  items  alone.  Dr.  Bruce  Bloxom  of  Vanderbilt  University  is 
presently  working  on  a  procedure  for  combining  m  multidimensional  ability 


scores  into  a  single  score  comparable  to  that  obtained  on  P&P-ASVAB. 

Recommendations 

The  individualized  nature  of  adaptive  tests  raises  interesting  test  develop¬ 
ment  issues  related  to  test  fairness.  The  adaptive  testing  process  to  be  imple¬ 
mented  in  CAT-ASVAB  is  supported  by  item  response  theory  methods  which 
assume  that  a  single  underlying  trait  is  measured  within  a  subtest.  Bock  et  al. 
(1985)  have  provided  a  promising  procedure  for  investigating  conformance  to 
this  assumption.  It  is  recommended  that  the  full-information  item  factor 
analysis  approach,  currently  being  used  as  an  exploratory  technique  in  the 
development  of  CAT-ASVAB  item  pools,  be  considered  for  use  in  other  test 
development  applications  involving  item  response  theory  methods. 

REFERENCES 

Bock,  R.D.,  Gibbons,  R.,  &  Muracki,  E.  (1985,  August).  Full-information  item 
factor  analysis  (MRC  Report  85-1).  Chicago,  ILL:  Methodology  Research 
Center/NORC. 

Green,  B.F.,  Bock,  R.D.,  Humphreys,  L.G.,  Linn,  R.B.,  &  Reckase,  M.D. 
(1982,  May).  Evaluation  plan  for  the  Computerized  Adaptive  Vocational  Apti¬ 
tude  Battery  (Research  Report  82-1).  Baltimore,  MD:  The  Johns  Hopkins 
University,  Department  of  Psychology. 

Lord,  F.M.  (1980).  Applications  of  item  response  theory  to  practical  testing 
problems.  Hillsdale.  NJ:  Erlbaum. 

Prestwood,  J.S.,  Vale,  C.D.,  Massey,  R.H.,  &  Welsh,  J.R.  (1985,  Septeniber). 
Development  of  an  adaptive  item  pool  for  the  ASVAB  (AFHRL-TR-85-1 9)  . 
Brooks  Air  Force  Base,  TX:  Air  Force  Human  Resources  Laboratory. 


37 


/.-AW.-;.";. -  .  ....  vv.  ov- 


SEX  DIFFERENCES  IN  IRT  TRUE-SCORE  EQUATING 


Daniel  O.  Segall 

Navy  Personnel  Research  and  Development  Center 

WiiUam  F.  Kfeckhaefer 
RGI,  Incorporated 

Kathleen  E.  Moreno 

Navy  Personnel  Research  and  Development  Center 

Introduction 

The  current  Armed  Services  Vocational  Aptitude  Battery  (P&P-ASVAB)  is  a 
paper-and-pencil  test  with  a  fixed  sequence  of  test  items.  The  Navy  Personnel 
Research  and  Development  Center  is  developing  a  computerized  adaptive  version 
(CAT-ASVAB)  as  a  possible  replacement  for  that  test.  The  CAT-ASVAB  will  tailor 
the  difficulty  of  the  test  items  administered  to  the  individual  from  responses  to  earlier 
items.  This  testing  method  is  expected  to  increase  the  efficiency  of  selecting  and  clas¬ 
sifying  new  accessions. 

If  the  CAT-ASVAB  becomes  operational,  it  will  be  implemented  gradually  so 
that  some  examinees  will  be  administered  the  CAT-ASVAB  while  others  will  receive 
the  P&P-ASVAB.  The  two  versions  use  different  estimators  of  ability.  The  P&P- 
ASVAB  uses  a  number  correct  score  for  each  examinee,  while  the  CAT-ASVAB 
computes  a  bayesian  estimate  of  ability.  Scores  from  the  two  versions  must  be 
equated  so  that  personnel  selection  and  classification  decisions  do  not  vary  between 
test  versions. 

Braun  and  Holland  (1982)  give  one  definition  of  test  equating.  They  adopt  the 
definition  Form-X  and  Form-Y  are  equated  on  population  P  if  the  distribution  of  the 
transformed  y  scores  in  population  P  is  the  same  as  the  distribution  of  the 
untransformed  x  scores.  Applying  this  definition  to  the  current  problem,  the  CAT- 
ASVAB  and  P&P-ASVAB  are  equated  on  population  P  if  the  distribution  of  the 
transformed  CAT-ASVAB  scores  in  this  population  is  the  same  as  the  distribution  of 
the  P&P-ASVAB  scores.  This  definition  has  the  desirable  quality  of  assuring  equal 
flow  rates  for  the  two  versions  of  the  ASVAB. 

Unfortunately  two  tests  that  are  equated  on  population  P  may  not  be  equated 
for  various  subpopulations  that  are  included  in  P .  Test  scores  that  are  equated  for 
the  military  applicant  population  may  not  be  equated  for  either  the  population  of 
female  applicants,  or  the  population  of  male  applicants. 

This  paper  investigates  the  application  of  IRT  true-score  equating  to  the  experi¬ 
mental  CAT-ASVAB.  An  effort  is  made  to  determine  whether  CAT-ASVAB  scores 
can  be  transformed  to  a  paper-and-pencil  scale  without  placing  either  males  or 
females  at  a  disadvantage  relative  to  their  P&P-ASVAB  scores. 

Method 

SUBJECTS.  During  April  of  1984  ,  200  male  and  200  female  Army  recruits  at 
Fort  Jackson,  South  Carolina  participated  in  this  study.  Each  subject  took  both  the 
CAT-ASVAB  and  the  P&P-ASVAB. 
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SUBTESTS.  The  five  subtests  of  the  P&P-ASVAB  selected  for  this  study  were 
taken  from  Form  8a:  Arithmetic  Reasoning  (AR),  Word  Knowledge  (WK),  General 
Science  (GS),  Paragraph  Comprehension  (PC),  and  Numerical  Operations  (NO). 

There  were  five  CAT-ASVAB  subtests  designed  to  measure  the  same  aptitude  as 
the  P&P-ASVAB  subtests  mentioned  above:  AR,  WK,  PC,  NO,  and  GS. 

PROCEDURES.  Proctors  seated  all  subjects  in  the  testing  area  for  the  ASVAB 
and  instructed  them  to  complete  the  Privacy  Act  Statement.  Then,  half  the  subjects 
took  seats  in  the  adjacent  CAT-ASVAB  testing  area.  Subjects  in  this  condition  com¬ 
pleted  the  CAT-ASVAB  first  and  the  ASVAB  second.  Subjects  in  the  other  condi¬ 
tion  (i.e.,  the  remaining  half  of  the  subjects)  took  the  P&P-ASVAB  first  and  the 
CAT-ASVAB  second. 

Equating  Transformations 

Although  each  CAT-ASVAB  subtest  is  designed  to  measure  the  same  cognitive 
ability  as  its  P&P-ASVAB  counterpart,  the  two  versions  are  not  on  comparable  scales. 
The  CAT-ASVAB  power  subtests  produce  an  ability  estimate  (0)  while  the  P&P- 
ASVAB  produces  a  number  correct  score  (x).  Although  the  CAT-ASVAB  subtest  of 
Numerical  Operations  does  produce  a  number-correct  score  similar  to  the  P&P- 
ASVAB,  the  method  of  responding  has  been  shown  to  effect  the  CAT-NO  score  dis¬ 
tribution.  Thus  for  each  content  area,  some  method  of  equating  the  two  versions  is 
necessary. 

We  used  two  different  procedures  to  equate  the  five  subtests,  depending  on 
whether  the  subtest  was  adaptive  or  speeded.  We  used  a  procedure  similar  to  one 
recommended  by  Green,  Bock,  Linn,  and  Recakase  (1985)  to  equate  the  CAT- 
ASVAB  power  subtests.  This  procedure  transforms  the  thetas  into  expanded 
expected  number  correct  (EENC)  scores.  We  used  an  equipercentile  method  to 
equate  the  CAT-NO  speeded  subtest.  Both  procedures  are  described  in  detail  below. 

POWER-SUBTESTS.  For  the  power-subtests  scores,  we  performed  a  two-stage 
equating.  First  we  calculated  expected  number  correct  scores  for  the  four  CAT- 
ASVAB  power  subtests.  Equation  (1)  transformed  the  estimated  CAT  ability,  0a,  for 
each  person  a ,  to  an  expected  number  correct  score  |a. 

L  -  7  2  i  P(aik,blk,cik-\),  (1) 

0  t-l  j=i 

where  n  equals  the  number  of  items  in  the  P&P  subtest,  and  P(alk  ,blk  ,cik  ;0a) 
represents  the  item  characteristic  curve  defined  by  the  three  parameter  logistic  model 
evaluated  at  0a,  for  item  i  of  P&P-ASVAB  form  k  (where  k  =  1,2,. ..,5,6) 

We  substituted  item  parameter  estimates  aik,  Slk,  cik  (Sympson  &  Hartmann, 
1985)  for  the  values  aik ,  b,k ,  cik  in  equation  (1).  Then  for  each  estimated  ability,  0a , 
an  expected  number  correct  score,  |a ,  was  obtained  from  (1). 

Second,  we  applied  a  linear  transformation  to  the  scores  computed  from  equa¬ 
tion  (1).  This  produced  expanded  expected  number  correct  (EENC)  scores  that  pos¬ 
sessed  the  same  mean  and  variance  as  the  observed  scores  of  the  corresponding  P&P- 
ASVAB.  Finally,  we  rounded  these  EENC  scores  to  the  nearest  integer  value  to  pro¬ 
duce  the  CAT-EENC  scores.  We  repeated  all  the  above  procedures  for  each  of  the 
CAT-ASVAB  power  subtests  (AR,  WK,  PC,  and  GS). 

SPEEDED-SUBTEST.  Numerical  Operations  (NO)  was  the  only  speeded  subtest 
included  in  this  study.  This  subtest  is  not  adaptive  and  produces  a  numb  ^r  correct 
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score. 


We  obtained  NO-equating  data  from  1,364  Army  recruits.  Each  recruit  received 
both  versions  of  the  subtest:  (a)  the  CAT-NO  subtest,  and  (b)  forms  8  or  9  of  the 
P&P-NO  subtest. 

We  used  an  equipercentile  method  to  equate  CAi-NO  to  P&P-NO.  First,  we 
obtained  CAT-NO  and  P&P-NO  number  correct  scores  at  99  different  cumulative- 
percentile  points.  These  cumulative-percentile  points  were  obtained  at  unit  intervals 
ranging  from  1  to  99,  inclusive.  Next,  we  used  least-squares-polynomial  regression  to 
smooth  the  equipercentile-equating  function.  We  calculated  polynomial  regressions  of 
several  different  orders  and  judged  the  quintic  regression  to  provide  the  best  fit  based 
on  a  root-mean-squared-error  criterion.  We  equated  scores  below  the  second  percen¬ 
tile  (of  the  CAT-NO  score  distribution)  using  linear  interpolation  from  the  (0,0)  point 
to  the  point  corresponding  to  the  second  percentile.  Then  we  obtained  smoothed- 
equipercentile  estimates  for  each  number-correct  score  using  the  estimated- 
polynomial-regression  equation  (or  by  linear  extrapolation).  Finally,  these  values 
were  rounded  to  the  nearest  integer. 

Results 

Kolmogorov-Smirnov  two-sample  (XS)  tests  were  used  to  test  the  difference 
between  the  P&P-ASVAB  and  equated  CAT-ASVAB  distribution  functions.  KS  tests 
were  run  for  all  five  subtests. 

The  total  sample  was  first  randomly  divided  into  two  groups,  with  the  restriction 
that  total  group  size  was  approximately  equal  and  the  number  of  females  and  males 
did  not  differ  by  more  than  one  across  the  two  groups.  The  next  step  computed 
expected  number  correct  (ENC)  scores  from  CAT-ASVAB  thetas  for  each  group. 
The  linear  transformation  which  transforms  ENC  scores  to  EENC  scores  was 
estimated  for  each  group  separately.  Each  transformation  was  then  used  to  compute 
the  CAT-EENC  scores  for  that  group. 

Two  comparisons  were  made:  (1)  a  comparison  of  the  CAT-EENC  score  distri¬ 
bution  of  Group  A  to  the  P&P-ASVAB  score  distributions  of  Group  B  and,  (2)  a 
comparison  of  the  CAT-EENC  score  distribution  of  Group  B  to  the  P&P-ASVAB 
score  distributions  of  Group  A.  Differences  were  tested  using  the  KS  statistic.  The 
above  procedure  was  repeated  separately  for  the  male,  female,  and  combined  samples. 

Table  1  presents  the  results  of  the  KS  tests.  Each  comparison  examines  the 
difference  between  the  CAT-EENC  (or  number  correct  for  the  NO  subtest)  distribu¬ 
tion  function  and  the  corresponding  P&P-ASVAB  distribution  function.  Only  two 
comparisons  were  significant  at  the  .05  level:  (1)  the  comparison  of  scores  on  NO  for 
females,  and  (2)  the  comparison  involving  the  AFQT  composite  for  females. 

Discussion 

The  results  of  the  KS  analysis  indicate  that  the  IRT  true-score  equating  pro¬ 
cedure  provides  similar  distributions  for  the  two  versions  of  the  ASVAB.  Neither 
males  or  females  appear  to  be  placed  ai  a  disadvantage  relative  to  their  P&P-ASVAB 
scores. 
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Table  1 

Kolmogorov-Smirnov  Two-Sample  Tests 


Comparison  One 

Comparison  Two 

Sex 

Subtest 

Z 

Probability 

Z 

Probability 

Males 

AR 

.927 

.36 

.480 

.98 

WK 

.758 

.61 

.478 

.98 

n  j  =72 

PC 

.677 

.75 

341 

.93 

n2= 73 

NO 

.883 

.42 

300 

.96 

GS 

.816 

.52 

.715 

.69 

AFOT 

.926 

.36 

300 

.96 

Females 

AR 

1.178 

.13 

.478 

.98 

WK 

.970 

.30 

.904 

39 

a  j“99 

PC 

.460 

.98 

347 

.93 

n2=100 

NO 

1.679 

.01 

.676 

.75 

GS 

1.078 

.20 

.901 

39 

AFOT 

1.444 

.03 

.600 

36 

Combined 

AR 

.737 

.65 

317 

.95 

WK 

.884 

.42 

.948 

33 

n  j  =  171 

PC 

.472 

.98 

.642 

.81 

«2=173 

NO 

1.223 

.10 

.684 

.74 

GS 

.476 

.98 

.845 

.47 

AFOT 

.802 

.54 

.474 

.98 

Reducing  the  Predictability  of  Adaptive  Item  Sequences 

C.  Douglas  Wetzel  James  R.  McBride 

.Vary  Personnel  The  Psychological  Corporation 

Research  &  Development  Center  San  Diego,  C.-t  92101 

San  Diego,  C.-l  92152 

Prcv  ious  research  into  the  psychometric  properties  of  Computerized  Adaptive 
'1'esting  (CAT)  has  shown  that  adaptive  tests  are  much  more  efficient  than  conventional 
tests  (eg.  Weiss.  197-1.  1 982).  Different  methods  of  adaptive  testing  vary  widely  in 
efficiency.  The  most  efficient  are  those  in  which  test  items  are  chosen  one  at  a  time,  in  a 
manner  which  optimizes  some  function  of  the  difference  between  item  difficulty  and  the 
current  estimate  of  the  examinee’s  ability  However,  main  optimization  CAT  strategies 
yield  a  predictable  sequence  of  test  items  early  in  testing  that  could  lead  to  over 
exposure  of  items  and  possible  compromise.  This  is  because  the  possible  sequences  of 
test  items  form  a  binary  tree  The  same  item  will  always  be  chosen  first;  only  two  items 
can  be  chosen  second,  and  so  on.  As  a  result,  all  examinees  who  answer  the  first  several 
items  the  same  way  will  encounter  identical  sequences  of  test  items,  making  compromise 
easy  and  almost  inevitable. 

This  repetition  of  predictable  item  sequences  is  illustrated  in  Figure  1  with  plots  of 
individual  examinee  ability  estimates  as  a  function  of  test  length.  The  course  of  testing 
fur  each  examinee  is  traced  up  the  page,  with  all  examinees  starting  at  an  ability 
estimate  of  zero  prior  to  being  given  their  first  test  question  (i.e.  item  zero).  It  can  be 
seen  that  after  the  first  test  item  is  administered,  there  are  only  two  possible  ability 
estimates  from  right  or  wrong  responses,  and  after  the  second  item  there  are  just  four 
possible  ability  estimates,  and  so  forth.  As  the  number  of  items  increases,  the  common 
paths  shown  early  in  testing  fan  out  into  a  number  of  unique  ability  estimates.  Smaller 
changes  in  the  ability  estimates  are  found  later  in  testing  (items  10-15)  where  they 
"home  in”  on  a  region  of  the  ability  continuum  and  become  more  reliable. 
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Figure  1.  Item-by-item  ability  estimates  ( 0 )  for  75  Marine  Corps  recruits  given  a  15 
item  adaptive  test  using  Owen’s  Bayesian  strategy  (Owen,  I960,  1975) 


Tins  situation  suggests  the  need  for  a  method  which  retains  the  efficiency  of  the 
mathematically  optimal  adaptive  strategies,  but  one  which  eliminates  the  occurrence  of 
predictable  sequences  of  test  items  One  method  of  avoiding  predictable  item  sequences 
is  to  choose  an  item  at  random  from  among  a  set  of  nearly  optimal  items  In  the  present 
work,  a  stratified  maximum  information  (STMI)  strategy  was  selected  for  extensive 
study  of  random  selection  from  several  good  or  informative  items  taken  at  different 
points  m  the  sequence  of  items  A  question  to  address  is  whether  this  technique  to 
reduce  the  repeated  exposure  of  the  very  best  items  in  the  bank  would  reduce  the 
psychometric  quality  of  the  resultant  adaptive  test,  relative  to  several  other  adaptive 
and  conventional  test  strategies 

APPROACH 

A  two-stage  computer  simulation  was  used  to  investigate  the  effect  of  various  item 
selection  strategies  on  the  psychometric  characteristics  of  the  resultant  tests  The  first 
stage  simulation  generated  item  parameter  estimates  with  typical  error  eharacteristics. 
This  item  bank  was  used  for  the  second  stage  in  simulated  administrations  of  adaptive 
and  conventional  tests  for  normal  and  rectangular  examinee  true  ability  distributions. 

Generating  Fallible  Item  Parameter  Estimates:  A  simulated  item  bank  was 
created  based  on  real  item  parameters  representative  of  a  ’’live”  testing  situation  The 
item  bank  consisted  of  two  parameters  sets,  the  "true”  parameters  {  a  ,  b  ,  e  }  ,  and 
simulated  estimates  {  a  .  b  .  c  }  Unlike  many  synthetic  item  banks  (c.g.  Wetzel  & 
McBride,  1983).  real  items  written  for  live  examinees  often  yield  unique  item  banks 
without  uniform  distributions  when  broken  down  by  each  item  parameter,  and  may  have 
a  distinct  positive  correlation  between  the  d -parameter  and  b  -parameter  r.; 
(Sympson.  Weiss  and  Ree:  1982).  To  achieve  a  test  information  curve  that 
approximated  that  of  real  test  items,  a  simulated  bank  was  based  on  item  parameters 
from  real  test  items.  Estimated  item  parameters  from  real  items  were  used  as  true 
parameters  for  a  simulation  of  the  item  calibration  phase.  The  200  real  item  parameters 
were  obtained  from  J  B.  Sympson  of  NPRDC  from  his  calibration  of  five  ASVAB 
content  areas  word  knowledge,  arithmetic  reasoning,  paragraph  comprehension,  general 
science  a’»d,  mathematical  knowledge.  These  banks  had  each  been  calibrated  with 
LOG  1ST  (Wood,  ct  al ,  197C)  on  approximately  1500-2000  live  examinees  for  entrance 
into  the  armed  forces.  In  the  present  study,  a  stratified  random  sample  from  these  five 
banks  (  970  total  items,  from  individual  banks  of  180-210  )  was  taken  by  randomly 
selecting  -10  items  from  each  of  the  five  banks  to  yield  a  new  combined  total  of  200 
items  These  3-parameter  logistic  estimates  were  then  used  as  if  they  were  ’true’  item 
parameters  in  a  simulation  in  which  examinees  were  administered  all  200  items 
Examinee  item  responses  were  simulated  by  using  the  3-paramet.er  logistic  model 
(Birnbauin,  1968)  to  generate  simulated  binary  responses  to  the  test  items  using  a 
probability  sampling  technique  often  employed  for  this  purpose  (Yale  and  Weiss,  1975). 
If  a  random  number  drawn  from  a  uniform  distribution  on  the  interval  (0,1)  wras  less 
than  the  3-paiameter  logistic  model  probability  of  a  correct  response  P  (0),  then  the 
examinee  was  credited  with  a  correct  answei,  otherwise  an  incorrect  item  response  was 
specified  The  simulated  correct  and  incorrect  item  responses  were  created  for  1500 
normally  distributed  simulated  examinees  for  these  200  items  and  then  calibrated  for  the 
present  study  with  LOGIST  These  parameter  estimates  were  then  joined  witli  the 
generating  "true”  parameters  for  simulations  of  the  various  test  strategies  studied  here. 

Examinee  True  Ability  ( 0 )  Distributions:  Each  test  strategy  simulation  run 
was  conducted  twice  with  1900  simulated  examinees  (1)  once  with  a  Rectangular  0 
Distribution  of  19  groups  (100  examinees  each)  25  0  units  apart  over  the  the  interval 
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in  in  1 1 1 1 1/ e  the  e\p"i  |  ed  value  of  the  posiciior  vaiiame  of  the  ability  (list  i  ibu  t  ion  (Owen. 
1 0ti0  |07’i(  Ih"  ability  estimate  (<lis|  i  ibut ion )  is  updated  after  each  item  and  the 
p  1 1  iiiir 1 1 1 s  "I  the  Haves  posteiioi  distnbutioii  ale  used  as  parameters  of  the  pilot 
d  1st  1 1  b  it  I  it  >n  bn  l  lie  next  item 

^initilnil  Minimum  Information  Tc*l  -  Hiit/isi.in  Scoring  /  S'l  Ml-H  I.  This  adaptive 
test  sidei  t  s  ih<  item  with  appioximatelv  the  greatest  item  mfoi  mat  ion  (  1(0)  )  at  the 
•  ii 1 1 «‘U (  Havrsiiii  ability  estimate  0  Items  are  selected  fiom  a  prearranged  information 

laid . .  "!  sti. .idled  I  is  i  s  of  information  values  cab  dialed  for  fixed  0  levels. 

I!a«h  list  of  the  mfoi  m. it  ion  table  ioniums  information  values  arianged  in  descending 
>  >i  del  o|  the  values  of  then  mfoi  mat  ion  functions  at  the  midpoints  of  a  series  of  nanow 
inteiv.ds  of  ability  In  this  study.  -Hi  lists  in  .12')  wide  0  increments  spanned  the  ability 
i.nige  loan  -22')  to  -2  2  b  The  ability  estimate  was  updated  after  each  item  with  the 
same  Bayesian  ability  estimution  piocedure  employed  in  the  Owens  lest  above  'I'll is 
VIAII-B  si  i  alegy  is  a  'by  bud*  (Wetzel  <V  MiBnde.  1  i)S-i )  between  previous  strategies  in 
th.it  the  same  mfoi  mat  ion  table  method  is  employed,  but  Bayesian  ability  scoting  is  used 
instead  of  maximum  likelihood  scoring  (Sympson.  Weiss  «V  Ree.  1982). 

'seven  veisimis  of  the  STMI-B  test  were  simulated  to  investigate  probabilistic  item 
sei|iieii<  es  pioduicd  by  i andoini/ ing  the  choice  among  items  The  top  k  consecutive 
il>  ms  with  gleatesi  mfoi  mat  ion  ill  an  information  table's  list  were  selected  and  held  in  a 
(<mp"iaiy  veitoi  One  >1  these  k  items  was  then  selected  randomly,  with  each  of  the 
it i  ins  hav  mg  cipial  •//'  selei  1 1> >ii  probabilities  This  i uiidomi/.at ion  among  the  most 
iiiloi  mat  iv  e  ill  in-  within  a  given  list  of  an  mfoimatioii  table  occulted  in  the  list  closest 
to  tin'  i  uncut  ability  estimate  Two  types  of  i  aiidomi/.at  ion  conditions  weie  studied, 

vv Ii n  Ii  dille red  in  whether  the  l//  probability  was  constant  lot  all  lb  test  items  or 

v  at  nal  as  a  film  1 1>  >n  of  i  he  order  of  item  ad  mi  nisi  rat  ion 

Constant  ve|e<li"U  Ratios  for  \ll  Items  \<l  ill  III  1st  ered 

(a)  I  I  11  (l>)  I  ">  I  '>  (e)  1  10  i  If)  (<l)  I  JO  I  JO  (e)  I  10  I  10 

sclci  i k mi  Rat los  Adjusted  According  io  Test  Length 

(f)  l  .  1  l  I  t  l  J  I  I  II  (k)  I  lo  I  s  l  i>  l  l  l  j  i  j 

The  i  < >iisi a n I  s(d<  riion  lali'is  lemamed  the  same  I’oi  each  of  lb  items  administered 

I  a-e )  'Ihe  hist  ad  |  ns|  mg  selection  ratio  stiategy  (I)  selected  the  first  item  in  the  test 
fiom  tin-  best  live  available  items  m  the  i  urreiil  mfoi  mat  ion  table  list  (I  b).  the  second 
item  fiom  Ibu i  (I  !).  the  third  from  three  (I  d),  the  fourth  from  only  two  (I  2).  and 
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then  the  fifth  through  tlu-  fifteenth  items  were  each  the  single  most  informative  item 
available  (1  1)  The  second  adjusting  ratio  strategy  (g)  used  the  first  four  ratios  shown 
for  the  first  four  items  and  then  used  a  ratio  of  1  2  for  the  fifth  through  the  fifteenth 
items  These  adjusting  ratio  strategies  randomize  more  for  early  items  since  they  are 
most  subject  to  compromise  Randomization  decreases  as  the  test  proceeds  so  the 
strategy  will  select  more  appropriate  items  when  the  ability  estimate  is  closest  to  it's 
terminal  value 

Maximum  Information  Full  Search  -  Bayesian  Scoring  (  MI-B  ).  Each  item  is 
selected  by  a  search  of  the  entire  bank  for  the  item  with  the  greatest  item  information  at 
the  exact  value  of  the  current  estimated  ability  (  0  )  This  exhaustive  item-by-item 
search  is  made  by  actually  calculating  item  information  for  0  "on-line"  each  time  an  item 
is  to  be  selected  during  test  administration  This  strategy  was  created  to  assess  any 
effect  of  granularity  in  the  STMI-B  test,  where  the  0  continuum  was  divided  into 
discrete  increments  in  rounding  0  to  the  nearest  .125  midpoint.  Previous  studies  using 
the  MI  strategy  have  employed  maximum  likelihood  scoring  rather  than  the  present 
Bayesian  scoring  (eg  .  McKinley  k  Reckase,  1 98 1 ,  Weiss.  1982). 

Stratified  Maximum  Information  Test  -  Maximum  Likelihood  Scoring  !  STS  1 1- ML  ). 
This  strategy  employs  Bayesian  ability  estimation  until  at  least  one  correct  and  one 
mcoriecc  item  lesponse  have  been  obtained  and,  then  uses  maximum  likelihood 
estimation  for  t Is e  remainder  of  the  items  in  the  test.  Items  were  selected  from  an 
information  t able  calculated  over  the  same  .125  wide  0  increments  used  for  the  STMI-B 
strategy  above  This  strategy  was  first  used  by  Sympson,  el.  al.  (1982). 

Weiss’s  Stradnptire  Test.  This  mechanical  adaptive  strategy  uses  a  pre-sorted  item 
bank  divided  into  strata'  on  the  basis  of  the  6 -parameter  and  then  arranges  items 
within  each  stratum  in  descending  order  of  the  values  of  their  a -parameters  (cf.  Weiss, 
1971).  There  were  nine  strata  in  this  study,  each  0.5  ability  units  wide,  over  the  range 
-2  2.5  to  +2.25.  Items  were  selected  from  the  top  of  the  stack  in  each  stratum. 
Branching  to  another  stratum  occurred  after  each  item,  branching  up  one  stratum  after 
a  correct  response,  and  down  one  stratum  after  an  incorrect  response. 

Peaked  Conventional  Test  This  conventional  test  was  designed  by  selecting  the  15 
items  with  the  greatest  values  of  information  at  the  central  ability  value  of  0.0.  All 
simulated  examinees  were  administered  this  same  set  of  15  items. 

Flat  Conventional  Test:  This  test  was  created  by  selecting  the  item  with  the 
greatest  item  information  value  at  each  of  15  equally  distributed  ability  points  over  the 
interval  -2.0  to  *2  0  All  simulated  examinees  were  administered  this  same  set  of  15 
items,  with  item  difficulty  increasing  as  the  test  proceeded. 

RESULTS 

Fidelity:  The  correlation  oi  fidelity  between  true  and  estimated  abdity  (  r  gg  )  for 
the  simulated  test  strategies  was  based  on  the  normal  0  distribution  consisting  of  a 
single  group  of  1900  examinees  All  coefficients  were  based  on  Bayesian  ability  estimates, 
except  the  STMl-ME  strategy  which  used  maximum  likelihood  ability  estimation.  All 
the  optimization  strategies  yield  similar  fidelities  (  Owen's  Bayesian  {.951},  STMl-ME 
{  950}.  STMI-B  {.955},  MI-B  {.956}),  with  the  Stradaptive  test  {.935}  performing 
slightly  better  than  the  conventional  tests  The  peaked  conventional  test  {  893}  is  lower 
than  the  flat  conventional  {  922}  ,  since  the  peaked  test,  does  not  span  the  extremes  of 
ability  As  the  denominator  of  the  fixed  selection  ratio  increases,  the  STMI-B  test  shows 
a  small  decrease  in  fidelity  {.953.  955.  952,  951,  .936}.  but  at  1 .10  it  never  falls  below 
thn  conventional  tests  The  adjusting  ratio  STMI-B  strategies  yield  fidelities  {  both  957 
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}  comparable  to  llie  1  i  and  1  •>  fixed  ratios  and  to  the  MI-B  full  search  strategy. 

Average  Test  Information:  Test  information  was  employed  as  an  index  of 
precision,  or  of  how  well  a  set  of  items  discriminates  an  ability  level  from  nearby  ability 
level-,  the  reciprocal  of  the  square  root  of  information  is  inversely  related  to  the 
standaid  enor  of  an  ability  estimate  (Lord.  1980)  It  was  used  here  as  a  measure  of  the 
appropriateness  of  the  set  of  items  administered  for  a  given  true  ability  level.  "Test 
mfoi  mat  ion”  {V  I  (0)\  i-  the  sum  of  the  individual  "item  information"  {/(<?)}  values 
for  the  items  administeied  m  an  individual  examinee's  test.  The  true  item  parameters 
and  true  examinee  ability  were  used  to  compute  these  lest  information  values,  which 
were  then  averaged  over  the  100  examinees  at  each  of  the  19  levels  of  true  ability. 
Figure  -  shows  the  obtained  test  information  (Lord.  1980)  averaged  over  the  100 
examines  in  each  of  the  19  rectangular  0  distribution  groups  for  each  test.  A  peak  is 
shown  around  a  0 ^  •  1 .2')  whitli  is  owed  to  the  correlation  of  the  <7-  and  b -parameters 
(  r  |  o 77.  »  0">9.  rt  -131  )  The  magnitude  of  the  correlation  was 

very  sinnl  t\ r  to  th.it  obtained  by  Sympson.  et.  al  ( 1982). 


T  K  V  h  A  li  1  b  I  i-  V  (  0  ) 

Figure  2.  Average  test  information  I'oi  test  strategies  over  19  t.rue  ability  (0)  groups. 


The  left-hand  panel  ol  liguie  7  shows  the  adaptive  tests  !o  yield  more  test, 
ml’oimation  than  the  two  conventional  tests  over  a  wide  range  of  0  ,  excepting  the 
Dcnked  conventional  lest  .it  the  nartovv  legion  aiotind  0.  The  conventional  tests 
icpresent  one  extieme  of  mismatch.  where  the  same  fixed  set  of  items  are  presented 
regaidless  of  the  location  of  the  examinee  on  the  ability  continuum.  The  Stradaptive 
test  generally  yielded  less  test  information  than  the  other  adaptive  tests  at  lower  ability 
levels  because  it  used  a  mechanical  strategy  that  did  not  correct  for  guessing.  The  1/1 
STMI-B  and  STMI-ML  tests  yielded  the  best  test  information  overall  and  were 
practically  equivalent  to  the  Owen’s  Bayesian  test 

The  right-hand  panel  of  Figure  2  shows  eight  vei.-nms  of  the  maximum  information 
strategy  using  Bayesian  s<ormg  Foi  the  live  constant  ratio  STMI-B  strategies  {  1/1. 
1  1  10,  I  20  A  1  10  ).  test  information  isdegiadcd  monotonieally  as  the  denominator 

of  the  ratio  increases,  ie,  the  selection  set  included  items  farther  down  in  the 
information  table  which  had  somewhat  lowci  disc  i  iminat  ions,  higliei  guessing  and  more 
inappropriate  difficulties  As  the  ratio  changes  fioin  1,1  to  I  10,  the  amount  of 


obtained  change  becomes  larger  for  each  doubling  of  the  denominator.  Substantial 
reduction-,  were  produced  with  the  extreme  120  and  1.40  ratios,  and  a  small  but 
acceptable  degradation  in  test  precision  resulted  when  1  5  available  items  was  randomly 
selected  throughout  the  test. 

The  remiining  tests  shown  in  the  right-hand  figure  panel  all  yielded  test 
information  that  was  approximately  the  same  as  the  STMI-B  1  1  condition  and  in  excess 
of  the  STMI-B  1  ">  test  First,  the  MI-B  condition  achieved  no  more  test  information 
than  any  other  condition,  indicating  that  the  12.')  wide  increments  on  which  the  STMI 
information  table  tests  were  based  were  small  enough  to  closely  approximate  this  full 
search  condition  Second,  the  two  STMI-B  adjusting  selection  ratio  strategies  yielded 
test  information  approximately  equivalent  to  the  STMI-B  1  1  and  MI-B  conditions 
which  used  no  randomization  at  all. 

CONCLUSIONS 

This  woik  suggested  the  following  conclusions:  (1)  The  STMI  strategy,  with 
Bayesian  ability  "stimation  seems  to  work  about  as  well  as  the  best  adaptive  testing 
strategies:  (2)  Predictable  sequences  of  test  items  can  be  avoided  by  modifying  STMI  so 
that  items  are  selected  at  random  from  a  nearly  optimal  set  of  items,  (3)  As  long  as  that 
set  is  small  in  number,  the  adaptive  test  will  not  lose  an  appreciable  amount  of 
efficiency.  (-1)  If  the  set  is  small  to  begin  with,  and  gets  progressively  even  smaller 
(through  specification  of  a  shrinking  set  size)  the  adaptive  test  is  virtually  as  efficient  as 
the  strategy  which  chooses  the  optimal  item  every  time. 
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Speeded  Tests  *  Can  Computers  Improve  Measurement? 

by 

John  H.  Wolfe 

Navy  Personnel  Research  and  Development  Center 
San  Diego,  California  92152-6800 

INTRODUCTION 

The  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  contains  two  speeded 
tests,  Coding  Speed(CS)  and  Numerical  Operations  (NO).  In  paper-and  pencil  mode, 
these  tests  are  administered  with  a  fixed  time  limit  and  scored  by  counting  the  number 
of  correct  responses.  Computerized  administration  of  the  same  items  offers  several 
interesting  alternative  methods  of  scoring.  For  example,  one  can  administer  the  tests 
with  no  time  limit,  so  that  everyone  finishes  the  test,  and  then  measure  the  time  each 
examinee  used.  Greaud  and  Green  (1985)  showed  that  scoring  such  a  test  with  a  "rate” 
measure  (equal  to  the  number  of  correct  responses  divided  by  the  test  time)  increased 
overall  test  reliability.  Computer  scoring  by  rates  has  two  advantages  over  ordinary 
administration:  (1)  there  is  no  "ceiling"  effect  for  the  fastest  examinees,  and  (2)  the 
scores  for  the  slowest  examinees  are  based  on  the  same  number  of  items  as  the  fastest 
examinees,  and  therefore  have  improved  reliability. 

Further  improvements  in  reliability  can  be  expected  from  measuring  the 
examinee’s  response  times  for  each  item,  and  combining  these  times  into  an  appropri¬ 
ate  total  score  or  scores.  As  a  starting  point  for  proposing  alternative  scoring  functions, 
consider  the  Greaud  and  Green  "rate": 


number  correct 
Total  Time 


By  dividing  the  numerator  and  denominator  by  the  total  number  of  items,  N,  the  for¬ 
mula  is  seen  to  be  equivalent  to: 


where  Pc  is  the  proportion  of  correct  responses  and  7  is  the  sample  mean  of  the  item 
response  times,  taken  over  all  of  the  items. 

The  first  method  of  improving  on  the  formula  might  be  to  compute  the  mean  time 
for  only  the  correct  responses.  If  two  examinees  take  the  test,  and  one  of  them  is  able 
to  answer  an  item  wrong  twice  as  fast  as  the  other  examinee  answers  it  wrong,  it  is 
not  clear  that  the  first  examinee  should  be  scored  higher  on  the  test.  It  seems  plausible 
that  incorrect  responses  should  be  eliminated  from  the  scoring. 

The  second  modification  to  the  formula  would  eliminate  "outliers"  from  the  com¬ 
putation  of  the  mean  item  .e.-.ponse  time.  It  is  not  uncommon  for  an  examinee  taking  a 
speeded  test  to  be  distracted  or  pause  to  ask  a  question  of  the  proctor.  Extraordinarily 
long  response  times  should  be  identified  and  omitted  from  the  scoring. 

One  potential  problem  with  computing  the  sample  mean  of  the  item  times  is  that 
the  distribution  of  times  is  highly  skewed.  In  geneinl,  the  sample  mean  of  a  skewed 
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distribution  is  not  necessarily  a  good  estimate  of  the  population  mean.  Thus,  a  third 
approach  to  improving  reliability  of  scoring  would  be  to  seek  some  transformation  of 
the  times  that  makes  their  distribution  more  nearly  normal.  Some  possibilities  that  have 

been  suggested  in  the  literature  are  log'/',,  \T„  and  To  these  can  be  added  — L:.  For 

each  transformation,  one  can  construct  a  co>responding  "rate"  measure  by  computing 
the  mean  of  the  transformed  times  and  then  transforming  the  mean  back  onto  the  time 
scale  to  get  a  to  replace  f  in  equation  (2).  Thus, 
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METHOD 

Eighty-five  randomly  chosen  recruits  at  the  Recruit  Training  Center,  San  Diego, 
were  administered  two  alternate  forms  of  Coding  Speed  and  two  alternate  forms  of 
Numerical  Operations  p.esented  on  Apple  ///  personal  computers.  The  order  of  testing 
was  randomized.  The  CS  tests  contained  12  sets  of  seven  items  per  screen,  and  the  NO 
tests  contained  17  sets  of  three  items  per  screen.  The  onset  of  a  new  screen  was  syn- 
chtonized  to  the  computer  so  that  stimulus  timing  began  when  the  electron  beam  was 
in  the  upper  left  cornei  of  the  monitor.  Timing  of  key  presses  was  measured  with  a 
hardware  clock  with  millisecond  accuracy.  Pascal  software  that  was  used  to  detect  key 
presses  introduced  constant  errors  on  the  order  of  a  tenth  of  a  second.  Each  test  was 
administered  with  a  16-minute  time  limit,  which  is  more  than  twice  the  usual  time. 
This  was  sufficient  for  most,  but  not  all  subjects  to  complete  the  tests. 

Outliers  were  defined  as  times  whose  logarithms  were  more  than  three  standard 
deviations  above  the  mean  log  time  for  that  examinee.  Reliabilities  were  computed  for 
ail  responses,  correct  responses  only,  and  correct  responses  trimmed  for  outliers. 

Skewness  and  kurtosis  statistics  were  computed  for  individual  item  times  in  Form 
A  of  the  Coding  Speed  test,  along  with  four  transformations  of  the  times.  For  all  tests, 


means  of  the  times  and  their  transformations  were  computed,  and  five  different  "rate" 
scores  were  obtained.  Skewness  and  alternate  form  reliabilities  were  computed  for  the 
five  means  and  the  five  rate  scores. 


RESULTS 

Table  1  shows  the  alternate  form  reliabilities  of  the  Greaud  and  Green  rate  scores 
when  all  item  times  are  scored,  when  only  correct  responses  are  scored,  and  when 
correct  responses  are  trimmed  at  the  upper  tail  for  outliers.  Restricting  the  scoring  to 
correct  responses  appeared  to  have  no  effect  on  reliability,  but  trimming  outliers  raised 
reliability  somewhat. 


Table  1 

Effect  of  Trimming  Incorrect  Responses  and  Outliers 
On  Reliability  of  Rate  Scores 


Coding 

Numerical 

Speed 

Operations 

All  Responses 

.8! 

.72 

Correct  Responses 

.80 

.73 

Trimmed  Correct  Responses 

.83 

.75 

Table  2  summarizes  the  skewness  and  kurtosis  characteristics  of  84  Form  A  Cod¬ 
ing  Speed  item  times  and  their  transformations.  As  expected,  the  times  were  quite 
skewed,  and  in  addition,  had  considerable  kurtosis.  Taking  logs  of  the  times  eliminated 
the  skewness,  and  also  reduced  kurtosis.  The  reciprocal  transformation  made  matters 
much  worse.  The  square  root  transformation  reduced  skewness,  but  not  as  much  as 
the  logarithms.  Taking  the  reciprocal  of  the  square  roots  made  skewness  and  kurtosis 
worse.  From  these  data,  it  appears  that  the  log  transformation  is  best,  and  that  the 


b\ 


reciprocal  should  not  be  used  to  normalize  the  data. 


Table  2 

Mean  Skewness  and  Kurtosis  of  84  Coding  Speed 
Item  Times  and  Their  Transformations 


Skewness 

Kurtosis 

0.99 

1.47 

log  T, 

-0.03 

0.84 

1/7; 

1.27 

5.50 

\T, 

0.51 

0.62 

i/Vt 

0.64 

2.55 

Table  3  shows  the  skewness  of  five  different  means  and  five  "rates"  based  on 
these  means.  Although  the  central  limit  theorem  merely  implies  that  the  means  of  the 
times  should  be  normally  distributed  within  each  individual,  it  is  still  somewhat 
surprising  that  the  mean  times,  reciprocals,  and  square  roots  are  significantly  skewed 
across  individuals.  Again,  the  log  transformation  showed  the  least  skewness.  None  of 
the  rate  measures  were  skewed  in  the  Coding  Speed  tests,  and  all  of  the  rates  were 
significantly  skewed  in  the  Numerical  Operations  tests. 


Table  3 

Skewness  of  Alternative  Scoring  Formulas 


Codins  Sneed 

Numerical  Operations 

Form  A  Form  B 

Form  A  Form  B 

Mean  T, 

0.84* 

1.46* 

0.93* 

0.52* 

Mean  log  T, 

0.14 

0.31 

0.16 

-0.15 

Mean  I//', 

0.64* 

0.41 

0.56* 

0.77* 

Mean  \Tt 

0.48 

0.81* 

0.54* 

0.17 

Mean  1  ifT, 

0.21 

0.08 

0.20 

0.46 

P  A  {Mean  T,) 

-0.18 

0.07 

0.52* 

0.61* 

PjExpiMcan  log  7,) 

-0.13 

0.05 

0.56* 

0.66* 

Pr(Mean  1/7',) 

-0.00 

0.03 

0.60* 

0.72* 

Pj{Mcan  fT,)2 

-0.17 

0.06 

0.54* 

0.64* 

P r[M can  \hH\f 

-0.08 

0.04 

0.58* 

0.69* 

*  Significant  at  p  < 

.05  by 

2-tailed  7-test. 

l 
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Table  4  is  an  expanded  version  of  Table  1,  in  which  mean  times,  mean  log  times, 
and  rates  based  on  mean  log  times  are  also  shown.  For  all  measures,  it  appears  that 
restricting  scoring  to  correct  responses  did  not  improve  reliability.  Eliminating  outliers 
did  improve  reliability.  The  log  transformation  improved  reliability  if  outliers  were  r.ot 
trimmed,  but  not  otherwise.  Trimming  outliers  improved  reliability  if  untransformed 
times  were  used,  but  not  otherwise.  Rate  scores  were  more  reliable  than  average  times 
for  Coding  Speed,  but  not  for  Numerical  Operations. 


Table  4 

Reliabilities  of  Alternative  Scoring  Functions 


Coding  Speed  Numerical  Operations 
All  Responses 


Mean  Time  ' 

.76 

.74 

Mean  Log  Time 

.79 

.75 

/y.Mean  Time 

.81 

.72 

/\./Exp(Mean  Log  Time) 

.82 

.75 

Untrimmed  Correct  Responses 

Mean  Time 

.74 

.74 

Mean  Log  Time 

.78 

.76 

/VMean  Time 

.80 

.73 

/yExp(Mean  Log  Time) 

.81 

.75 

Trimmed  Correct  Responses 


Mean  Time 

.77 

.76 

Mean  Log  Time 

.78 

.76 

/yMean  Time 

.83 

.75 

/yExp(Mean  Log  Time) 

.82 

.76 

DISCUSSION 

The  results  presented  in  this  paper  are  only  preliminary:  more  subjects  remain  to 
be  tested,  and  additional  analyses  need  to  be  performed,  particularly  on  the  Numerical 
Operations  items.  Nevertheless,  certain  conclusions  and  directions  for  future  work 
stand  out: 

One  successful  method  for  improving  reliability  is  to  eliminate  outliers.  Future 
u oi k  should  explore  this  method  in  more  detail.  The  optimal  cutting  point  for  trim¬ 
ming  the  data  is  a  question  that  needs  to  be  answered  empirically. 

Another  method  that  improved  reliability  as  much  as  trimming  outliers  was  the 
log  transformation  of  times.  So  far,  there  is  no  indication  that  both  methods  combined 


are  better  than  one  of  them.  However,  combining  both  methods  does  not  decrease  reli¬ 
ability,  and  may  inspire  greater  confidence.  In  the  end,  considerations  of  computational 
speed  and  program  complexity  may  be  the  determining  factors  in  deciding  which 
method(s)  to  use. 

In  all  of  the  work  described  here,  it  has  been  implicitly  assumed  that  the  items 
are  homogeneous  in  difficulty,  and  only  individual  differences  have  been  examined. 
The  next  step  in  the  research  should  be  to  examine  items,  and  to  develop  a  model  that 
encompasses  both  item  characteristics  and  individual  differences  in  ability.  This 
approach  should  be  especially  useful  in  Numerical  Operations,  where  addition,  subtrac¬ 
tion,  multiplication,  and  division  have  quite  different  average  response  times. 


REFERENCE  NOTES 

Greaud,  V.A.  &  Green,  B.G.  (19851  Equivalence  of  conventional  and  computer  presen¬ 
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BACKGROUND 

Computerized  adaptive  testing  is  being  considered  for  use  in  military  selection  and 
classification.  One  research  question  under  investigation  concerns  whether  medium  of  adminis¬ 
tration  affects  examinee  attitudes  toward  the  current  Armed  Services  Vocational  Aptitude  Bat¬ 
tery  (P&P-ASVAB).  Recognizing  the  importance  of  this,  the  Defense  Advisory  Committee  on 
Mihtf  y  Personnel  Testing  recommended  that  the  reactions  of  examinees  to  the  computerized 
adaptive  version  of  that  test  (CAT- ASVAB)  should  be  systematically  collected  and  analyzed 
(Linn,  Bond,  Britcll,  Campbell,  Jaeger,  Novick,  and  Uhlaner,  1983). 

Early  researchers  reported  on  particular  aspects  of  the  relation  between  attitudes  and 
computerized  testing.  For  example,  Hcdl,  O’Neil,  and  Hansen  (1973)  showed  that  less  favor¬ 
able  reactions  of  the  subjects  to  a  computerized  test  were  due  to  a  lack  of  -larity  in  the 
instructions  and  unfamiliarity  with  computer  terminals.  In  a  related  study,  Walthcr  and  O’Neil 
(1974)  found  that  subjects  with  greater  test  anxiety  or  negative  attitudes  toward  computers  per¬ 
formed  more  slowly  and  made  more  errors  on  a  test  than  subjects  with  lower  levels  of  test  anx¬ 
iety.  More  recently,  Nillcs,  Carlson,  Gray,  Hayes,  Holmcn,  and  White  (1980)  supported  the 
view  that  computer  usage  generates  anxiety  which  negatively  impacts  on  user  attitudes  and  per¬ 
formance. 

Other  researchers  present  a  different  view.  Schmidt,  Urry,  and  Gugcl  (1978)  found  that 
examinees  completing  a  computerized  adaptive  test  had  positive  attitudes  toward  it.  In  a  study 
on  computer-managed  insti  action,  Robinson,  Tomblin,  and  Houston  (1982)  also  reported  posi¬ 
tive  user  attitudes. 

More  specifically  for  ASVAB  testing,  Mitchell,  Hardwicke,  Segall,  and  Vicino  (1983) 
reported  generally  positive  attitudes  of  male  Navy  rcciuits  toward  taking  the  CAT-ASVAB. 
Some  attitudes  corresponded  with  subjects’  level  of  experience  with  computers  or  keyboards. 
For  example,  those  with  "little  to  none"  computer  experience  were  more  likely  to  indicate  that 
computerized  testing  was  more  impersonal  than  papcr-and-pcncil  testing.  Subjects  with  'little 
to  none"  keyboard  experience  were  generally  more  likely  to  express  uneasiness  about  taking  a 
test  on  a  computer  and  to  indicate  that  the  computerized  fcst  was  more  difficult.  This  research 
showed  that  while  computerized  testing  may  be  considered  impersonal  by  some,  this  perception 
docs  not  imply  a  negative  attitude  toward  computerized  tests.  These  findings  were  supported  in 
the  service-wide  study  of  Hardwicke  and  Yocs  (1984). 

PURPOSE 

Investigations  of  user  attitudes  toward  computerized  testing  can  benefit  from  clearer  atti- 
tudinal  items  and  improved  experimental  designs.  Furthermore,  little  is  known  about  the  rela¬ 
tion  between  attitudes  toward  ASVAB  testing  and  examinee  ability  or  such  background  vari¬ 
ables  as  race  and  sex.  The  purpose  of  this  study  was  to  assess  the  effects  of  the  medium  of 
administration  on  attitudes  toward  ASVAB  testing.  While  some  research  docs  exist  in  this 
aiea,  this  research  is  still  exploratory.  Therefore,  the  research  objectives  were  to: 

(1)  Examine  the  effect  of  medium  of  administration  on  examinee  attitudes  toward  testing; 

and 


(2)  Determine  if  this  relation  differs  as  a  function  of  sex,  race,  or  ability. 

METHOD 

This  is  part  of  a  larger  study  investigating  several  aspects  of  subgroup  differences  in 
ASVAB  testing.  While  a  total  of  3,094  Navy  recruits  were  tested  under  the  larger  study,  only 
the  data  for  619  recruits  tested  at  the  Recruit  Training  Center  in  Orlando,  Florida  were  avail¬ 
able  at  the  Mmc  of  this  presentation.  Therefore,  findings  presented  here  arc  preliminary.  The 
sample  was  (A  9%  male  and  39.1  %  female.  Also,  49.4%  were  White,  18.7%  were  Hispanic,  and 
31.7%  were  Black. 

Test  proctors  selected  all  recruit  companies  of  men  and  women  which  began  training 
between  April  and  June  of  1985.  Then  proctors  randomly  selected  recruits  from  a  company  for 
testing  and  randomly  assigned  them  to  complete  either  the  P&P-ASVAB  or  the  CAT-ASVAB 
first  These  were  the  Medium  of  Administration  conditions.  Test  proctors  collected  the  back¬ 
ground  information  regarding  examinee  sex  and  race  from  each  subject’s  enlistment  form  (DD 
form  1966)  They  also  obtained  Pre-enlistment  scores  on  the  Armed  Forces  Qualification  Test 
(AFQT)  from  that  form. 

After  completing  the  first  version  of  the  ASVAB  (i.c.,  P&P  or  CAT),  subjects  responded 
on  a  seven-point  scale  to  nine  questions  presented  in  a  papcr-and-pcncil  format.  These  were 
the  dependent  variables  in  the  present  study.  While  recruits  did  complete  the  other  ASVAB 
version  also,  data  from  the  second  version  arc  not  relevant  to  this  attitude  study  and  are  not 
reported  here. 

RESULTS 

Re  ponses  to  each  of  the  nine  attitude  questions  served  as  the  dependent  variables  in  a 
three-factor  analysis  of  variance:  Medium  by  Sex  by  Race.  The  significance  of  the  AFQT 
covariatc  was  obtained  for  each  dependent  variable.  An  analysis  of  covariance  was  performed 
for  items  with  significant  AFQT  covariatcs. 

For  each  attitude  question,  Table  1  shows  the  number  of  respondents  (N),  the  overall 
mean,  and  the  cell  means  for  significant  main  effects  in  the  analysis  of  variance.  Figure  1  dep¬ 
icts  the  significant  interaction  effects  involving  medium.  Table  2  shows  each  attitude  question 
and  the  distribution  of  responses.  For  significant  medium  effects,  the  distribution  is  shown  for 
each  group. 

Only  question  4  had  a  significant-AFQT  covariatc.  When  entered  into  an  analysis  of 
covariance,  two  of  the  previously  significant  effects  were  no  longer  significant:  the  main  effect 
of  Race  and  the  interaction  effect  of  Sex  by  Race. 

DISCUSSION 

While  these  arc  preliminary  findings,  six  of  the  nine  attitude  questions  had  significant 
main  effects  of  Medium-all  favoring  the  CAT-ASVAB.  Those  taking  the  CAT-ASVAB  felt 
better  about  taking  the  test  battery.  In  addition,  they  felt  less  tired,  they  experienced  less  eye 
strain,  and  they  thought  the  test  battery  ws  shorter.  These  results  arc  probably  due  to  the 
fewer  number  of  items  required  by  the  CAT  ASVAB.  In  general,  the  CAT-ASVAB  measures 
the  same  content  areas  as  the  P&P-ASVAB  with  about  half  the  number  of  items. 

Furthermore,  those  taking  the  CAT-ASVAB  felt  more  relaxed  during  the  test.  Perhaps 
this  is  because  they  proceeded  at  their  own  pace  while  those  taking  the  P&P-ASVAB  had 
specified  time  limits.  Also,  those  taking  the  CAT-ASVAB  reported  that  the  instructions  were 
clearer.  Inis  may  be  attributable  to  the  interactive  nature  of  the  instructions,  which  included 
immediate  feedback  during  a  keyboard  familiarization  sequence  and  sample  questions. 

Three  questions  showed  no  main  effects  of  medium.  These  indinted  no  differences 
between  administration  conditions  on  anxiety,  perceived  difficulty  of  the  questions,  and  per¬ 
ceived  fairness. 


Table  1 

Cell  Meatu  for  Main  Effects 

Item 

Attitude 

N 

Overall 

Medium* 

Sex* 

Race* 

1 

Overall  feelinp 

597 

2.7 

2.4 

30 

2 

Fatigue 

596 

3i 

40 

3J 

3 

Anxiety 

595 

3i 

4 

Question  Difficulty 

594 

39 

30 

4.0* 

4.0 

30 

30 

5 

Fairness 

595 

11 

6 

Tes'  Length 

593 

40 

3.4 

4i 

7 

Pressure 

593 

30 

2.7 

3 1 

8 

Eye  Fatigue 

593 

33 

3.4 

3.5 

3 1 

3  2 

3  2 

9 

Instruction  Clarity 

593 

li 

1.4 

1.6 

10 

1.4 

*  The  order  of  cell  mean*  u  CAT-ASVAB.  P&P-ASVAB. 

*  The  order  of  cell  meana  is  Male,  Female. 

*  The  order  of  cell  means  is  Black.  Hispanic,  White. 

*  This  effect  was  not  significant  when  AFQT  was  a  covariate. 


„  Question  2:  Question  2:  Question  4: 

>  S e x  X  Medium  Race  X  Medium  Race  X  Medium 


1 

7 

7 

Quite 

Quite 

Quite 

I)  i  f  f  i  -  6  . 

Rested^ 

Rested 

cult 

5- 

5 

3  • 

Black 

M  a  1  e  s 

\ 

4 

4 

Wh  i  t  e  4  - 

3 

F e m a  1  es 

3 

His  3 

Quite 

Quite 

Quite  -) 

Tired  2 

Tired  z 1 

Easy 

1  ’ 

1 

1 

\  CAT  ASVAB  CAT  ASVAB 

y 


Black 


White 

Hisp¬ 

anic 


CAT  ASVAB 


I 


F  i  ?,  ure  1  . 


Tnteraction  effects  with  Mediur. 


Table  2 

Distribution  of  Responses  to  Attitude  Questions 

1.  Overall, 

how  did  you  feel  about  taking  the  test  battery? 

1 

2 

3 

4 

5 

6 

7 

extremely 

quite 

slightly 

neither 

slightly 

quite 

extremely 

good 

good 

good 

bad 

bad 

bad 

CAI-ASVAB  16.4% 

42.7  % 

26.3% 

95% 

4.7% 

0.4% 

0.0% 

PiPASVAB-  5.6% 

35.6% 

26.9% 

24.1% 

53% 

22% 

03% 

2.  Overai 

,  how  tired  did  you 

feel  at  the  end  of  the  test? 

1 

2 

3 

4 

5 

6 

7 

extremely 

quite 

slightly 

neither 

slightly 

quite 

extremely 

tired 

tired 

tired 

rested 

rested 

rested 

I'Al-ASVAB:  2.2% 

8.4% 

35.0% 

22.6% 

102% 

16.8  % 

4.7% 

;  PAP  ASVAB.  53% 

19.9% 

472% 

18.0% 

43% 

43% 

0.9% 

1 

3.  During  the  test,  how  anxious  did  you  feel? 

1 

2 

3 

4 

5 

6 

7 

extremely 

quite 

slightly 

neither 

slightly 

quite 

extremely 

|  calm 

calm 

calm 

anxious 

anxious 

anxious 

|  Overall: a  8.4% 

29.6  % 

11.6% 

17  5% 

25.0% 

62% 

1.7% 

4. 

What  is  your  opinion  of  the  difficulty  of  the  questions? 

1 

2 

3 

4 

5 

6 

7 

extremely 

quite 

slightly 

neither 

slightly 

quite 

extremely 

easy 

easy 

easy 

difficult 

difficult 

difficult 

Overall--  1.0% 

14.5% 

20.4% 

28.8% 

31.8% 

33% 

0.0% 

5.  How  fair  do  you  feel  the  test 

was? 

J 

2 

3 

4 

5 

6 

7 

extremely 

quite 

slightly 

neither 

slightly 

quite 

extremely 

fair 

fair 

fair 

unfair 

unfair 

unfair 

Overall: u  23.9% 

1 - 

55.1% 

7.7% 

9.6% 

2.4% 

0.8% 

05% 

a  I;or  comparisons  with  non-significant  medium  effects,  the  distribution  of  responses  is  shown  for  the  combined 
CAT-ASVAB  and  PAP-ASVAB  groups 
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Table  2 

(continued) 

6. 

What  is  your  opinion  of  the  length  of  the  test  battery? 

1 

2 

3 

4 

5 

6 

7 

extremely 

quite 

slightly 

neither 

slightly 

quite 

extremely 

short 

short 

short 

long 

long 

long 

CAl-ASVAB:  6.6% 

17.9% 

21 2% 

383% 

13.9% 

18% 

0.4% 

P4P-ASVAB-  0.9% 

5.0% 

10.0% 

33.8% 

30.0% 

138% 

6.6% 

7. 

During  the  test,  how  relaxed  or  pressured  did  you  feel? 

1  1 

2 

3 

4 

5 

6 

7 

•  extremely 

quite 

slightly 

neither 

slightly 

quite 

extremely 

i  relaxed 

!• 

relaxed 

relaxed 

pressured 

pressured 

pressured 

j  CAl-A.tVAB  21.2% 

35.0  % 

172% 

12.0  % 

12.8% 

18% 

0.0  % 

i  PtP-ASVAB:  103% 

29.8% 

17.6% 

16.9% 

20.7% 

38% 

0.9% 

; 

j  8.  During  the  test,  how  strained  or  tired  did  your  eyes  feel? 

1 

2 

3 

4 

5 

6 

7 

extremely 

quite 

slightly 

neither 

slightly 

quite 

extremely 

tired 

tired 

tired 

rested 

rested 

rested 

CA'l-ASVAB  7.7% 

17.2% 

37.6% 

17  5% 

58% 

9.1% 

5.1% 

PAP-ASVAB'  8.2% 

22.9% 

33.9% 

21.0% 

5.3% 

6.9% 

1.9% 

9.  How  clear  do  you  feel  the  instructions  were? 

1 

2 

3 

4 

5 

6 

7 

extremely 

quite 

slightly 

neither 

slightly 

quite 

extremely 

clear 

clear 

clear 

confusing 

confusing 

confusing 

i  CAl-ASVAB  68.2% 

29 2% 

0.7% 

0.7% 

1.1% 

0.0% 

0.0% 

|  P&P-ASVAB  53.0% 

41.7% 

1.6% 

1.6% 

0.9% 

0.6% 

0.6% 

The  other  main  and  interaction  effects  arc  not  so  readily  interpreted.  A  cultural  bias 
hypothesis  would  favor  White  males  over  all  other  subgroups.  Since  this  result  occurred  for 
only  four  of  the  remaining  nine  main  and  interaction  effects,  these  results  do  not  support  a  cul¬ 
tural  bias  hypothesis. 

When  all  tin  data  arc  ready,  analyses  will  include  measures  of  computer  knowledge  and 
test  performance.  Then,  studies  win  investigate  the  relations  between  attitudes,  computer 
knowledge,  biographical  characteristics,  ability,  and  test  performance. 
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STATISTICAL  PROCESS  CONTROLS  AS  AN  ENHANCEMENT  TO  JOB  DESIGN 


Steven  L.  Dockstader 

Navy  Personnel  Research  and  Development  Center 

Abstract 

The  paper  presents  an  argument  for  adopting  statistical 
process  control  as  an  approach  for  improving  productivity  in 
organizations.  The  fundamental  problems  of  resitance  to 
change  are  discussed  and  job  changes  resulting  from  SPC  are 
logically  analyzed  in  terms  of  job  characteristics  theory. 

It  is  concluded  that  performance  increases  can  be  expected 
from  the  approach,  but  whether  they  are  on  the  ability  or 
motivational  dimension  is  unclear  and  should  be  the  subject 
of  future  research 


Introduction 

The  purpose  of  this  paper  is  to  attempt  to  provide  an  explanation  for 
the  varying  degrees  of  success  found  for  projects  seeking  to  improve 
productivity  through  changes  in  quality  controls.  It  is  the  contention  of 
this  author  that  productivity  derived  from  quality  improvement  is  the 
result  of  (a)  leadership  initiatives  which  support  a  change  in  the  methods 
of  quality  control  and  (b)  factors  intrinsic  to  the  methods  which 
contribute  to  both  their  acceptance  and  incentive  value  to  the  individual. 

There  are  two  basic  approaches  used  to  achieve  product  or  service 
quality.  The  methods  vary  on  the  extent  to  which  the  responsibility  for 
quality  is  vested  in  the  producer  of  the  product  or  service  or  with  an 
external  agent,  such  as  a  quality  checker.  The  methods  can  be  described 
schematically  as  follows: 


I 


Work  Process 

Product/Service  j. 

(Measurement ) 

N 

^  7 

II 


Work  Process 


Product/Service  M  Inspection|~"^j  Delivery  | 


Approach  I  will,  be  referred  to  as  the  process  control  approach,  while 
approach  II  will  be  referred  to  as  the  product  inspection  approach. 
Proponents  of  process  control  contend  that  it  leads  to  higher  levels  of 
quality  and  lower  costs  than  the  product  inspection  approach.  Sherkenbach 
(.1984)  has  argued  that  greater  efficiencies  are  achieved  by  the  process 
control  approach  because  the  methods,  over  time,  eliminate  the  problems  in 
the  (production)  system  which  result  in  defects.  This,  in  turn  eliminates 
waste,  rework,  and  the  need  for  inspectors.  Juran  (1974),  in  a  massive 
review  of  process  control  work,  has  pointed  out  that  systems  problems 
account  for  the  vast  majority  of  cuality  defects- -as  high  as  85%--and  that 
the  systems  problems  can  best  be  dealt  with  by  the  process  control 
approach . 
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In  the  following  discussion,  Lawler's  seven  sources  of  resistance  will 
be  considered  in  turn  for  the  operations  of  a  large  aircraft  overhaul 
facility  in  the  Navy.  Although  the  observations  made  here  are  not  based 
upon  empirical  data,  they  have  been  corroborated  by  senior  managers  in  that 
organization  and  by  on-site  research  personnel. 

1.  The  process  control  approach  does  measure  performance  in  new  areas. 
In  fact,  the  essence  of  the  approach  involves  measuring  several  significant 
features  about  the  production  process  prior  to  completion  of  a  product  or 
service . 

2.  Ultimately  many  of  the  personnel  who  are  currently  used  in  the 
quality  control  department  will  be  deployed  to  other  parts  of  the 
organization,  or  be  conducting  quality  control  activities  not  currently 
being  performed  (e.g.,  incoming  supplies,  customer  services,  etc.).  This 
displacement  and/or  retraining  of  personnel  is  viewed  as  a  threat  by  those 
in  the  current  quality  control  function. 

3.  Quality  control  standards  are  usually  established  by  engineers  or 
quality  technicians.  In  the  case  of  process  control,  however,  a  fixed 
standard  has  no  meaning.  Control  is  defined  by  taking  actions  to  keep  the 
process  within  variability  limits  which  are  determined  by  the  process 
itself.  Because  the  limits  change  as  a  function  of  improvements  in  the 
system,  no  fixed  standard  can  be  applied. 

4.  Using  the  process  control  approach,  the  basic  data  is  collected  by 
the  performer.  In  this  sense,  feedback  is  immediate.  Furthermore,  because 
the  information  gathered  is  typically  a  historical  record  with  relational 
information  on  the  record  (e.g.,  a  control  chart),  the  worker  can  evaluate 
the  data  and  determine  what  action,  if  any,  need  be  taken. 

5.  Whether  the  data  is  fed  to  higher  levels  and  used  within  the  reward 
system  depends  upon  a  number  of  factors.  The  most  significant  is  the 
degree  to  which  the  worker  has  discretion  to  make  decisions  concerning 
corrections  to  the  system.  This,  in  turn,  is  usually  based  upon  the  extent 
of  the  system  changes  and  their  costs,  but  could  also  be  a  reflection  of 
the  management  philosophy  of  the  organization.  This  will  be  considered  in 
greater  detail  in  a  subsequent  discussion. 

6.  This  is  the  "status  quo"  factor,  and  it  can  be  said  that  a  change 
in  the  intertial  state  of  the  organization  will  be  determined  by  whether  or 
not  a  "critical  mass"  (Deming,  1985b)  can  be  developed  to  overcome  the 
status  quo.  The  state  of  inertia  in  most  bureaucracies,  such  as  those  in 
most  large  bureaucratic  organizations,  is  at  steady  state  and  resistant  to 
change  under  normal  workload  conditions. 

7.  It  is  difficult  to  assess  this  factor.  The  people  most  affected  by 
the  quality  control  system  are  those  in  the  "production"  area.  As  a  group, 
they  are  the  largest  in  number  and  exert  the  greatest  influence  on 
achieving  the  mission  of  the  organization.  However,  under  the  current 
product  inspection  approach,  they  receive  the  most  censure  when  product 
quality  does  not  meet  specifications/test.  Managers  have  been  of  the 
opinion  that  this  has  led  the  workers  to  lose  identity  with  the  quality  of 
their  products  because  someone  else  has  been  responsible  for  detecting  it. 
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Lawler  has  indicated  that,  to  the  extent  that  these  factors  hold  for 
workers,  they  will  engage  in  non-productive  or  even  counter  productive 
behaviors.  Using  his  analysis  and  the  previous  discussion,  it  appears  that 
the  process  control  approach  should  meet  less  resistance  in  terms  of 
factors  3,  4,  6,  and  7.  That  is  to  say  that  it  (a)  does  not  deal  with 
standards  per  se,  (b)  provides  feedback  information  to  the  performer,  (c) 
is  of  greater  benefit  to  most  of  the  workforce  than  the  existing  system  and 
(d)  can  enhance  the  self  esteem  of  the  worker  as  he  begins  to  take  charge 
of  the  quality  of  his  work. 

Of  the  other  factors,  only  the  second  appears  to  be  of  significant 
concern  in  terms  of  resistance  to  change.  In  the  organization  under  study, 
quality  control  is  vested  in  a  functional  department.  While  performing 
inspections  or  audits  of  the  work  conducted  in  the  production  area  is  not 
their  only  function,  it  does  define  their  central  raison  d'etre.  In 
addition,  this  organization  is  one  of  several  which  reports  to  a 
headquarters.  Both  the  headquarters  and  the  sister  organizations  contain 
quality  control  functions  based  upon  inspection  and  audit.  Resistance  here 
would  have  to  be  overcome . 

Factors  1  and  5  are  potentially  areas  of  resistance  because  of  the  new 
measures  and  added  work  required  (1)  and  because  the  information  could  be 
used  to  evaluate  performance  of  the  workers  (5).  Neither  of  these  is 
necessarily  negative,  but  the  workforce  is  often  wary  concerning  the  use  of 
performance  measures.  If  the  management  philosophy  and  the  culture  of  the 
organization  is  one  that  rewards  improvement  then  there  will  be  little 
resistance . 


Process  Control  and  Worker  Motivation 

Our  discussion  thus  far  has  focussed  on  the  desirability  of  changing 
from  product  inspection  to  process  control  and  the  nature  of  resistances  in 
making  such  a  change.  While  it  appears  obvious  that  such  a  change  is  both 
desireable  and  feasible  from  a  management  standpoint,  what  is  in  it  for  the 
worker?  After  all,  with  the  exception  of  some  of  the  existing  quality 
control  personnel,  the  major  job  changes  will  be  that  of  the  worker  and 
perhaps  his  immediate  supervisor.  If  this  change  is  not  seen  by  the  worker 
as  having  incentive  value,  then  it  will  very  likely  be  resisted. 

Job  Characteristics  Theory  (Hackman  &  Lawler,  1971;  Hackman  &  Oldham, 
1976)  provides  a  conceptual  framework  to  evaluate  the  design  of  a  workers 
job  to  include  the  process  control  approach  to  quality  control.  The  theory 
is  based  upon  a  plethora  of  research  which  has  revealed  that  there  are 
three  psychological  states  which  contribute  to  worker  motivation.  These 
are  feelings  of  meaningfulness,  responsibility,  and  knowledge  of  results. 
The  theory  goes  on  to  describe  what  job  characteristics  will  result  in 
these  feelings.  The  theoretical  relationships  can  be  schematicized  as 
follows : 
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Considering  each  of  these  characteristics  in  turn,  we  can  determine  the 
motivating  potential,  or  incentive,  of  the  workers  job  when  process  control 
becomes  a  part  of  the  job.  Skill  variety  is  obviously  increased  because 
the  job  will  now  involve  collection  of  data,  charting  of  data,  and 
reporting  process  aberrations.  Task  identity  should  increase  because 
attention  will  be  focussed  on  aspects  of  the  process  which  were  previously 
receiving  less  formal,  e.g.,  measurement,  attention.  The  perceived 
significance  of  the  task  may  also  be  enhanced  because  taking  process 
control  actions  should  occasion  interaction  with  supervisors,  staff,  and 
managers  which  would  not  ordinarily  occur.  Autonomy  will  be  increased 
because  quality  control  actions  and  responsibilities  will  now  be  formally 
placed  in  the  hands  of  the  worker.  Finally,  feedback  will  be  immediate  in 
terms  of  the  things  being  measured.  Feedoack  as  a  result  of  process 
changes  will,  in  most  cases,  be  immediate  as  well. 


From  this  logical  analysis  and  the  model  displayed  in  Table  1,  we  can 
predict  the  outcomes  displayed  there.  During  the  course  of  the  forthcoming 
year,  these  hypothetical  relationships  will  be  tested  in  the  Navy 
maintenance  environment.  The  use  of  process  control  as  a  method  to  enrich 
jobs  has  not  received  attention  in  Che  empirical  literature,  but  the 
aforementioned  analysis  suggests  that  it  should  be  an  effective  way  to 
motivate  workers  as  well  as  increase  the  quality  of  their  efforts. 


Locke  (1980)  has  indicated  that  job  enrichment  has  not  been  effective 
in  motivating  employee  performance,  when  the  effects  of  goal  setting  have 
been  controlled.  Such  a  confounding  of  variables  is  not  expected  in  the 
process  control  situation,  because  the  demand  characteristics  of  process 
control  are  not  upon  employee  effort,  but  upon  removal  of  system-generated 
variation.  Performance  increases,  then,  would  probably  result  from 
enhanced  ability  to  perform  the  job,  as  opposed  to  the  motivation  to 
increase  effort. 


This  latter  point  begs  the  question  of  whether  employee  motivation 
could  be  at  all  affected  by  adopting  the  process  control  approach. 

Lawler's  (1976)  discussion  of  expectancy  models  of  worker  motivation  would 
suggest  that,  in  the  absence  of  extrinsic  rewards  for  quality  improvement, 
the  motivational  impact  can  only  be  derived  from  (a)  the  E P  probability 
or  (b)  or  the  valence  associated  with  outcomes  other  than  extrinsic 
rewards.  From  our  logical  analysis  above,  it  is  not  clear  which  of  the 
two,  or  some  combination  of  both  could  account  for  changes  in  the 
motivating  potential  of  a  job  enriched  by  inclusion  of  process  controls  for 
quality.  These  questions  will  be  addressed  in  future  research. 
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A  GROUP  WAGE  INCENTIVE  SYSTEM: 
DESIGN  AND  IMPLEMENTATION  ISSUES 
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San  Diego,  California  92152-6800 


INTRODUCTION 

In  an  effort  to  improve  performance  and  reduce  costs  in  a 
naval  shipyard,  a  group  wage  incentive  system  for  production 
workers  was  developed,  implemented,  and  evaluated.  The  system 
was  designed  to  improve  performance  efficiency  without 
negatively  affecting  schedule  adherence,  product  quality,  or 
workers*  job  attitudes. 

This  project  was  part  of  a  continuing  research  program  to 
investigate  the  effects  of  wage  incentive  systems  in  Navy 
industrial  facilities.  Previous  projects  evaluated  the  effects 
of  performance  contingent  reward  systems  (PCRSs)  with  a  variety 
of  civil  service  employees:  key  entry  operators,  small  purchase 
buyers,  and  aircraft  engine  mechanics.  Under  a  PCRS,  employees 
earn  cash  bonuses  (incentive  awards)  for  work  performed  above 
established  standards.  The  more  performance  exceeds  the 
standard,  the  larger  the  bonus.  PCRS  rewards  are  paid  through 
existing  award  programs,  are  recurrent  (being  accrued  as  often 
as  performance  exceeds  standard) ,  and  are  in  addition  to 
employees'  base  salary. 

The  present  effort  differed  from  previous  projects  in  that 
awards  were  based  on  measures  of  croup  performance.  Shipyard 
production  workers  typically  work  together  in  teams  (called  work 
gangs)  of  10  to  20  employees  supervised  by  one  foreman.  Thus,  a 
PCRS  based  on  measures  of  group  performance  was  more  appropriate 
than  one  based  on  individual  performance  measures. 


INCENTIVE  SYSTEM  OVERVIEW 

The  performance  measure  used  for  this  system  was  one  of 
performance  efficiency.  It  was  calculated  by  dividing  the 
manhours  allowed  to  complete  a  work  gang's  jobs  by  the  manhours 
actually  expended  to  complete  the  work.  Thus,  when  work  is 
completed  in  exactly  the  time  allowed  for  that  work,  the  work 
gang's  performance  efficiency,  called  a  performance  factor  (PF) , 
is  100  percent.  When  work  is  completed  in  less  time  than  the 
allowance ,  the  gang's  PF  will  be  greater  than  100  percent  and 
manhours  will  be  saved.  Both  inputs  to  this  measure  (manhours 
allowed  and  expended)  were  routinely  collected  by  the  shipyard's 
management  information  system  (MIS) .  Prior  to  implementation, 
the  shipyard  MIS  was  further  enhanced  to  provide  more  accurate 
performance  measures  and  to  provide  monthly  automated  incentive 
award  calculations  and  continual  award  tracking. 


Under  the  shipyard's  PCRS,  work  gangs  were  eligible  for 
awards  whenever  they  saved  manhours  by  completing  their  jobs  in 
less  time  than  the  manhours  allowed  for  those  jobs.  The  value 
of  these  saved  hours  was  shared  with  employees  in  the  form  of 
incentive  awards.  The  work  gang's  saved  hours  were  distributed 
to  members  based  on  each  worker's  contribution  to  the  workgang 
(his  or  her  share  of  the  gang's  total  work  hours).  Based  on  the 
50  percent  sharing  rate  used  during  the  system  test,  half  of  the 
cost  savings  associated  with  a  work  gang's  manhour  savings  were 
paid  out  to  gang  members  as  incentive  awards.  The  remaining  50 
percent  was  retained  by  the  shipyard.  The  actual  value  of  each 
saved  hour  was  based  on  the  employee's  accelerated  hourly  wage 
rate . 


A  similar  incentive  system  was  established  for  shop  foremen 
in  which  all  foremen  comprised  one  group  eligible  for  awards 
whenever  performance  of  the  entire  shop  resulted  in  manhour 
savings.  In  addition,  to  encourage  foremen  to  work  together  to 
help  the  shop  improve,  each  foreman  received  a  one-time  bonus  of 
$125  the  first  time  the  shop's  PF  exceeded  100  percent. 

Because  the  shop  selected  for  test  of  the  incentive  system 
historically  spent  many  more  manhours  to  complete  its  work  than 
were  allowed,  few  work  gangs  would  save  manhours  and  earn 
incentives  at  typical  performance  levels.  Since  incentive 
systems  do  not  motivate  employees  to  improve  performance  unless 
they  believe  it's  possible  to  earn  awards,  shipyard  managers 
decided  to  adjust  all  performance  measures  upward  by  10  percent 
for  the  purposes  of  subsequent  award  calculations.  Thus,  work 
gangs  actually  accrued  manhour  savings  whenever  their  PF 
exceeded  90  percent  and  foreman  earned  awards  whenever  the 
shop's  PF  exceeded  90  percent.  Despite  this  adjustment,  the 
incentive  system  rewarded  employees  for  performance  improvement. 


IMPLEMENTATION 

Prior  to  implementation  of  the  test  system,  a  shipyard 
instruction  was  issued  documenting  the  incentive  system  and 
specifying  responsibilities  during  the  test  period.  A  senior 
military  officer  was  assigned  as  project  manager  and  a  general 
foremen  within  the  test  shop  served  as  system  coordinator.  An 
agreement  was  negotiated  with  the  local  union  and  approval  was 
obtained  from  the  appropriate  headquarters  commands.  Finally, 
employees  and  supervisors  in  the  test  shop  were  given  training 
to*  assure  their  understanding  of  the  enhanced  performance 
measurement  system  and  the  group  incentive  system.  The  PCRS  was 
then  implemented  for  test  in  Shop  31,  the  shipyard's  inside 
machine  shop.  Shop  31  is  one  of  17  shops  at  the  shipyard  and 
employs  approximately  480  wage  grade  employees  and  23  foremen 
assigned  to  18  work  gangs. 
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RESULTS 


During  the  19  months  of  the  system  test  a  total  of  $177,000 
was  earned  by  employees.  Sixteen  of  the  eighteen  work  gangs 
(comprising  89  percent  of  shop  employees)  earned  awards.  Of 
those  employees  earning  awards,  the  average  total  earnings 
during  the  test  period  was  $419.  (Total  earnings  ranged  from  $1 
to  $2488.)  Foremen  earnings  averaged  $237  for  the  2  months 
that  the  shop’s  performance  factor  exceeded  90  percent. 

Evaluation  of  the  incentive  system  test  revealed  that  the 
program  produced  a  significant  increase  in  the  shop's 
performance  efficiency  (see  Figure  1) .  For  analysis  purposes 
the  19  month  test  was  divided  into  two  phases:  Incentive  Phase 
1  consisted  of  the  first  8  4-week  incentive  periods  and 
Incentive  Phase  2  consisted  of  the  remaining  11  4-week  incentive 
periods.  The  shop  showed  a  7.5  percent  improvement  over  average 
baseline  performance  during  Incentive  Phase  2  (an  improvement 
from  91.4%  to  97.6%).  During  the  first  incentive  phase,  the 
shop  maintained  its  baseline  performance  despite  a  severe 
workload  reduction  (performance  averaged  89.0%).  Two  comparison 
shops  (see  Table  1)  showed  substantial  performance  decreases 
during  the  same  time,  although  their  workload  reductions  were 
less  severe  than  that  of  the  test  shop.  Since  the  end  of  the 
19-month  test,  the  test  shop  has  maintained  its  improved 
performance.  As  expected,  implementation  of  the  system  did  not 
cause  any  negative  effects  on  the  shop's  schedule  adherence  or 
product  quality. 

Participants'  job  attitudes  and  evaluations  of  the 
incentive  system  were  assessed  during  the  test.  Although 
recognizing  certain  problems  related  to  system  operation 
(particularly  the  effects  of  the  workload  reduction) ,  80%  of 
those  expressing  an  opinion  favored  continuing  the  incentive 
system.  No  positive  or  negative  effects  on  workers'  job 
attitudes  (e.g.,  job  satisfaction  and  job  stress)  were  found. 

A  cost  savings  analysis  revealed  that  the  net  cost  savings 
due  to  improvements  over  baseline  performance  during  the  system 
test  exceeded  $600,000.  If  similar  results  occurred  following 
expansion  of  the  system  to  all  other  production  shops,  the 
shipyard  could  realize  net  cost  savings  of  approximately 
$6,794,000  annually. 

The  shipyard  realized  a  number  of  concurrent  positive 
outcomes  from  the  system  test,  including  improvements  in  shop 
practices  and  initiation  of  management  actions  directed  toward 
resolving  productivity  impediments.  Foremen  began  taking 
greater  care  in  preparing  employee  time  cards,  correcting  labor 
mischarges,  and  reviewing  work  documents  before  beginning  jobs. 
As  a  result  of  the  interest  in  improvement  engendered  by  the 
incentive  system,  a  number  of  productivity  impediments  were 
highlighted  during  the  test.  These  impediments  were  not  new. 
Rather,  they  were  long-standing  shipyard  problems  that  became 


more  salient  when  money  was  tied  to  performance.  The  incentive 
system  provided  the  impetus  to  attack  these  problems  and  as  a 
result  a  shipyard-wide  problem-solving  team  was  established  and 
successfully  resolved  a  number  of  these  issues. 


IMPLEMENTATION  AND  MAINTENANCE  ISSUES 

Throughout  this  effort,  various  issues  surfaced  that 
revealed  the  complexity  of  designing  and  implementing 
productivity  improvement  systems  in  real  organizations.  While 
the  success  of  the  test  system  indicates  these  issues  can  be 
effectively  resolved,  nonetheless,  some  important  conclusions 
can  be  drawn  from  this  project. 

Several  issues  arose  during  the  design  of  the  incentive 
system.  The  decision  to  develop  a  group  system  was  a  logical 
result  of  analysis  of  the  shipyard's  work  settings. 
Implementation  of  a  typical  incentive  system  (most  likely  based 
on  measures  on  individual  performance)  would  have  been 
inappropriate  and  quite  possibly  ineffective.  Managers 
considering  a  PCRS  should  realize  that  no  standard  system 
exists.  The  PCRS  must  be  designed  to  fit  the  organization  and 
its  priorities. 

A  number  of  incentive  system  parameters  had  to  be  specified 
in  the  design  phase,  as  well.  These  included:  the  incentive 
level  (the  performance  level  at  which  employees  were  eligible  to 
earn  awards) ,  the  sharing  rate  (the  proportion  of  cost  savings 
shared  with  employees) ,  and  the  savings  distribution  method  (the 
way  savings  were  shared  among  work  gang  members) .  As  discussed, 
the  incentive  level  was  dropped  to  90  percent  in  the  belief  that 
this  level  would  be  seen  as  attainable  by  workers  and  that  it 
would  improve  the  motivating  potential  of  the  system.  The  50 
percent  sharing  rate  used  in  this  test  was  selected  because  it 
was  the  maximum  allowable  by  federal  regulations  and  because  it 
was  likely  to  be  perceived  as  fair  to  both  employees  and  the 
organization.  Distribution  of  savings  based  on  worker  inputs 
was  used  to  further  strengthen  participants'  perceptions  of 
fairness.  These  design  parameters  can  be  assumed  to  have  been 
effective  based  on  the  favorable  results  of  the  test  period. 
However,  there  is  no  way  to  determine  if  different  parameters 
might  have  been  substantially  more  effective.  Little  research 
has  been  done  to  investigate  the  effectiveness  of  different 
levels  of  these  parameters  (e.g.,  a  20%  vs.  a  50%  sharing  rate) 
or  the  situational  variables  that  require  parameter  changes. 

During  the  implementation  and  maintenance  phases, 
additional  issues  arose.  Primary  among  these  was  management's 
relationship  to  the  incentive  system.  The  success  of 
organizational  change  efforts  such  as  incentive  systems  is  at 
least  in  part  contingent  on  active  support  from  management.  A 
high  degree  of  commitment  to  the  program  is  necessary  before  and 
after  implementation,  commitment  involving  more  than  just  verbal 
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support.  It  is  difficult  to  implement  effective  changes  when 
either  top  management  or  those  expected  to  implement  change  are 
unsupport ive. 

During  the  test  of  the  incentive  system,  the  shipyard 
experienced  a  rather  significant  workload  reduction.  This 
appeared  to  have  precluded  performance  improvement  until  the 
shop's  staffing  level  was  brought  into  balance.  Workers  are 
unlikely  to  improve  their  productivity  if,  by  so  doing,  they 
believe  they  will  risk  their  jobs.  In  implementing  performance 
improvement  programs,  managers  must  continually  address  the 
balance  between  workload  and  staffing  within  the  organization 
and  must  develop  means  to  capitalize  on  the  effects  of  resulting 
improvement . 

Tying  money  to  performance  highlighted  a  number  of  long¬ 
standing  shipyard  problems  that  were  subsequently  addressed  by  a 
problem-solving  team.  The  importance  of  tapping  this  increased 
interest  in  performance  improvement  cannot  be  overestimated. 
Many  shipyard  managers  believed  that  the  incentive  system's 
major  benefits  were  in  encouraging  supervisors  to  do  their  jobs 
and  in  focusing  efforts  on  resolving  productivity  impediments. 
Such  auxiliary  benefits  of  incentive  systems  should  not  be 
overlooked. 

Finally,  the  issue  of  incentive  system  expansion  surfaced. 
To  continue  to  run  a  test  system  in  only  one  shop  is  not 
feasible.  With  proven  cost  savings  resulting  from  performance 
improvement  the  next  logical  step  is  to  expand  to  other  shops. 
Managers  must  carefully  consider  how  and  how  far  to  expand  a 
successful  incentive  system.  During  expansion,  care  must  be 
taken  to  adapt  the  system  to  other  sites  and  to  continue  to 
monitor  its  effectiveness.  Managers  should  also  consider  means 
to  include  production  support  and  other  indirect  workers  in  such 
incentive  systems. 

Although  there  are  a  substantial  number  of  complex  issues 
that  must  be  faced  in  developing  and  implementing  wage  incentive 
systems,  their  proven  effectiveness  indicates  that  such  efforts 
are  worthwhile.  Further  research  to  identify  effective  design 
parameters,  the  increased  use  of  automation  to  support  wage 
incentive  systems,  and  the  benefits  that  can  be  derived  from 
additional  experience  with  these  systems  will  help  to  limit  the 
effort  required  to  design  and  implement  wage  incentive  systems 
in  the  future. 


The  opinions  expressed  in  this  paper  are  those  of  the  author  and 
should  not  be  construed  as  official  or  as  reflecting  the  views 
of  the  Department  of  the  Navy. 
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Figure  I 
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Performance  and  Workload  Trends  Tor  Three  Key  P.  eduction 
Shops  Huring  Baseline  and  Two  Incentive  Phases 
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Workload: 

Average  Man-day  Allowances  per  k-week  Period 
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Incentive  Phase  2 
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aBaselme:  10  lanuary  1983  -  Ik  July  1983. 

^Incentive  Phase  I:  13  Inly  1983  -  27  lanuary  198k. 
incentive  Phase  2:  28  lanuary  198k  -  30  November  198k. 
^Figures  represent  the  average  PI-  within  each  tune  frame. 
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ABSTRACT 

A  goal  setting  program  was  implemented  in  an  Navy  industrial  organization  that  used  engineered  performance 
Standards.  Results  indicated  that  workers  whose  baseline  performance  was  below  standard  set  more  difficult  goals 
and  improved  their  performance  more  than  high  performing  workers.  Discussion  centered  on  the  role  of  context  in 
influencing  goal  setting  effectiveness  in  Navy  orgamaatlons. 

BACKGROUND 

There  is  growing  concern  m  the  United  States  with  what  has  been  labeled  the  "U.S.  Productivity  Crisis"  (Newsweek,  1983). 
This  crisis  is  manifested  in  the  declining  rate  of  growth  in  the  output  per  hour  of  labor.  The  United  States  finished  well  behind 
six  other  industrial  nations  in  productivity  increases  from  1968  to  I97g  (Bureau  of  Labor  Statistics,  1979).  Within  the  Navy, 
concern  over  worker  productivity  has  created  increasing  interest  in  productivity  improvement  at  all  levels  ol  the  organization. 

Traditionally,  productivity  programs  in  both  the  military  and  civiban  sectors  have  centered  on  technological  improvements 
and  capital  investments.  Vlhie  the  importance  ol  these  hardware-oriented  approaches  is  obvious,  there  is  a  growing  body  of 
organizational  literature  that  suggests  tnat  significant  productivity  improvements  can  be  realized  through  improved  worker 
motivation  (Cremer,  Hatry,  Koss,  Millar,  4  Woodward,  1981).  Several  dillerent  techniques  have  been  investigated,  including 
autonomous  work  groups,  |ob  restructuring,  participative  management,  and  monetary  incentive  systems.  Each  ol  the  above 
approaches  has  been  shown  to  have  merit  under  diller.ng  Circumstances  (Cummings  4  Molloy,  1977;  Patten,  1977). 

Goal  Setting 

Goal  setting  is  an  area  ol  organizational  research  that  seems  to  be  especially  promising  in  terms  ol  enhancing  worker 
motivation  and  performance.  Research  has  shown  that  goals  are  a  major  source  ol  work  motivation  (Mitchell,  1979).  Likewise,  a 
recent  review  of  field  studies  using  goal  setting  techniques  found  a  16%  median  improvement  in  worker  performance  (Locke, 
Feren,  McCa'eo,  Shaw,  4  Denny,  1980).  Based  on  the  results  ol  a  number  ol  highly  successful  lab  and  field  studies,  goal  setting 
has  been  called  "a  simple,  straightforward,  and  highly  effective  technique  for  motivating  employee  performance”  (Latham  4 
Locke.  1979,  p.  80). 

While  it  IS  clear  that  goal  setting  can  be  an  effective  motivational  technique,  we  feel  that  comparatively  little  study  has 
been  directed  toward  careful  investigation  of  the  possible  limitations  of  the  approach.  There  are  very  likely  no  panaceas  in  any 
field  ol  applied  science  (Locke,  Sirota,  4  Wolfson,  1976).  As  such,  it  would  seem  that  goal  setting  theory  is  subject  to  boundary 
limitations  regarding  to  whom  it  applies  and  where  it  works  best  (Miner,  1980).  The  current  study  addressed  this  issue  by 
examining  the  effectiveness  of  goal  setting  in  an  industrial  organization  that  made  extensive  use  of  engineered  performance 
standards. 

The  use  of  task  standards  is  an  outgrowth  of  the  basic  tenets  of  scientific  management  (see  Taylor,  1967).  These  standards 
represent  the  time  a  trained  employee  working  at  a  normal  pace  would  be  expected  to  complete  a  given  task.  They  are  usually 
based  on  time  and  motions  studies  or  on  historical  performance  trends.  Within  many  industrial  organizations  work  standards  have 
been  established  for  most  production  jobs.  While  these  standards  are  used  for  advance  cost  estimates,  manpower  projections,  and 
other  planning  requirements,  they  also  serve  another  implicit  function— they  establish  acceptable  performance  levels  for  workers 
(Maynard,  1971).  In  this  sense,  a  standard  is  a  goal  for  workers  to  try  to  achieve  (Locke,  1978), 

If  achieving  standards  represents  an  acceptable  performance  level,  then  industrial  organizations  that  make  extensive  use  of 
task  standards  may  encounter  problems  in  implementing  goal  letting  programs  for  workers.  The  basic  proposition  of  goal  setting 
theory  states  that  there  is  a  positive  relationship  between  the  difficulty  of  an  accepted  task  goal  and  level  of  performance  on  the 
task  (Locke,  1968).  Considerable  research  has  shown  that  hard,  specific  goals  (if  accepted)  result  in  performance  improvements 
(Locke,  Shaw,  Saari,  4  Latham,  1981).  Performance  standards  certainly  detine  specific  goalss  however,  they  may  not  always  be 
dillicult.  While  performing  at  standard  level  may  be  challenging  for  employees  with  low  ability  and  work  motivation,  it  wouldn’t 
represent  a  challenging  goal  lor  a  motivated  and  highly  skilled  employee. 

Goals  Versus  Current  Perlormance  Levels 


The  objective  of  goal  setting  is  to  establish  specific,  challenging  goals  lor  all  workers.  Individuals  are  encouraged  or 
required  to  have  dillerent  goals  dependent  on  their  current  perlormance  level.  The  problem  with  goal  setting  in  an  organization 
using  industrial  standards  is  that  the  organization  is  sending  mixed  messages.  The  supervisor  is  trying  to  establish  a  challenging 
goal  (or  the  worker  (often  above  standard  performance  level)  while  the  organization  has  previously  defined  standard  performance 
as  acceptable. 

One  means  of  possibly  reducing  the  above  oroblem  is  for  the  supervisor  to  assign  goals.  The  supervisor  could  then  set  goals 
based  on  current  performance  independent  ol  existing  standards.  Research  has  shown  that  if  goal  difficulty  is  held  constant, 
equal  goal  acceptance  and  perlormance  improvements'  are  obtained  regardless  ol  whether  goals  are  assigned  or  set 
participitively  (Dossett,  Latham,  4  Muchell,  1979;  Latham  4  Saari,  1979;  Latham,  Steele,  4  Saari,  1981).  However,  there  is 
some  evidence  to  suggest  that  when  both  participative  and  assigned  goals  are  set  independently,  participative  goal  setting  may 
result  in  more  dilftcult  goals  (Latham  4  Yukl,  197  5o;  Latham,  Mitchell,  4  Dossett,  1978).  However,  given  ihe  current  state  ol 
knowledge,  it  would  be  dillicult  to  predict  which  method  would  be  more  effective  in  organizations  with  existing  performance 
standards.  Also,  regardless  ol  the  method  used,  it  is  no!  certain  whether  or  not  workers  would  set  or  accept  goals  above 
standard. 

While  the  best  means  ol  setting  goals  remains  unclear,  there  is  one  sub-group  ol  workers  who  might  be  expected  to  improve 
more  as  the  result  o(  a  goal  setting  program  in  an  industrial  organization  —  low  performers.  Individuals  who  are  currently 
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ptrlormmg  below  standard  are  not  laced  with  contacting  messages  when  goals  are  established  by  or  with  the  supervisor.  In 
addition,  it  is  possible  that  high  performers  are  more  .ikely  to  understand  task  requirements  and  have  personal  performance  gov’s 
than  are  low  performers.  Thus,  .t  seems  reasonable  to  expect  a  goal  setting  intervention  with  production  workers  to  have  its 
greatest  impact  on  low  performers. 

One  recent  study  supports  this  contention  for  nonproduction  workers.  Pritchard,  Bigby,  Belting,  Coverdale,  and  Morgan 
(1981)  found  that  for  data  transcribers,  goat  setting  and  feedback  had  a  positive  impact  on  poor  performers  but  no  impact  on 
good  performers.  They  argued  that  since  the  treatment  was  designed  to  increase  motivation,  and  since  the  good  performers  were 
probably  already  motivated,  the  treatment  had  little  impact  on  them. 

The  purpose  of  the  current  study  was  to  examine  the  impact  of  goal  setting  and  feedback  on  the  performance  of  Navy 
industrial  employees  working  within  the  context  ol  existing  performance  standards.  It  was  hypothesized  that  low  performers 
(workers  historically  performing  below  standard)  would  have  more  difficult  goals  and  would  show  greater  performance 
improvements  than  high  performers  (workers  historically  performing  at  or  above  standard). 

METHOD 

The  engine  division  of  a  Naval  Air  Rework  Facility  (NARF)  served  as  the  research  setting  lor  the  study.  Production  workers 
in  this  division  were  involved  in  the  overhaul  of  aircraft  engines,  components,  and  accessories.  They  were  all  civil  service 
employees,  predominately  male,  and  most  had  a  high  school  education. 

Prior  to  the  goal  setting  intervention,  assigned  tasks  included  a  description  of  required  work  and  the  time  allocated  for  the 
work  (i.e.,  the  performance  standard).  However,  although  workers  knew  how  well  they  performed  on  individual  tasks  when  they 
completed  them,  they  were  not  provided  with  any  summary  feedback  of  their  performance  on  ail  work  du'ing  a  given  tune 
period.  Since  feedback  has  been  shown  to  be  a  necessary  condition  for  goal  setting  to  be  effective  (see  Locke  et  a!.,  19X1),  it 
was  first  necessary  to  design  an  individual  work  measurement  and  feedback  system. 

A  computerized  system  was  developed  to  measure  individual  level  performance  using  the  existing  NARF  management 
information  system.  A  weekly  report  was  then  generated  for  each  employee  providing  performance  feedback  for  the  previous 
week.  The  performance  measure  was  based  on  how  well  workers  performed  against  standard  and  was  calculated  by  taking  the 
ratio  oi  time  expended  on  tasks  in  a  given  week  to  the  total  standards  earned  in  that  same  week.  This  ligure  was  then  multiplied 
by  100.  Thus,  a  rating  of  100  meant  that  an  individual  completed  all  work  within  the  standard  time  allocated.  Ratings  higher 
than  100  indicated  performance  better  than  standard  and  those  lower  than  100,  performance  below  standard. 

Twenty-two  production  shops  in  the  engine  division  were  included  in  the  study.  Each  shop  was  supervised  by  its  own 
foreman.  Eleven  shops  were  selected  for  the  goal  setting  treatment  and  the  remaining  XI  were  used  as  a  comparison  group.  All 
shops  were  both  spatially  and  structurally  distinct  subunits. 

The  1 1  experimental  shop  foremen  were  trained  in  the  use  of  the  new  worker  feedback  reports  and  in  goal  setting;  six  were 
trained  to  assign  goals  to  subordinates  while  the  remaining  five  were  trained  to  set  goals  participatively  with  subordinates.  The 
foremen  were  asked  to  arrive  at  different  goals  for  different  subordinates  based  on  the  worker's  ability,  motivational  level  and 
current  performance.  The  foremen  were  initially  resistant  to  the  notion  of  setting  challenging  goals  for  workers  who  were 
alreaoy  performing  at  or  above  standard.  They  felt  that  these  employees  were  currently  doing  more  than  should  be  expected  oi 
them.  However,  the  foremen  agreed  to  proceed  and  give  the  program  a  fair  chance. 

.  The  11  foremen  in  the  experimental  groups  met  individually  with  their  subordinates  to  either  assign  a  challenging 
performance  goal  or  to  arrive  at  such  a  goal  participatively.  In  addition  to  receiving  the  weekly  performance  report,  workers  in 
the  goal  setting  shops  met  with  their  foremen  individuafly  every  2  to  4  weeks  to  discuss  progress  towards  their  goals  and  possible 
work  problems. 

An  18-week  period  prior  to  the  beginning  of  goal  setting  and  feedback  *  as  used  to  establish  a  baseline  level  of  performance 
fcr  both  the  experimental  and  comparison  workers.  The  22-week  period  after  program  impleme  nation  was  used  to  assess 
program  effectiveness.  Sixty-seven  workers  participated  in  setting  their  goals  while  51  were  assigned  goals.  The  comparison 
shops  were  composed  of  117  workers. 

The  weekly  employee  perlormance  data  were  aggregated  to  form  single  pre-  and  post-trcvment  performance  scores  for 
each  worker.  Reliability  coefficients,  computed  on  the  weekly  performance  measures  for  the  baseline  period,  indicated  that  the 
data  were  sufficiently  reliable  lor  use  as  overall  performance  measures  Icoefltcient  Alpha  =  .IS), 

Research  has  shown  that  ob;ective  measures  of  goal  difficulty  are  often  better  predictors  of  perlormance  improvement  than 
subjective  measures  (Yukl  4  Latham,  1978).  For  this  reason,  goal  difficulty  was  operationalized  as  the  difference  between  an 
individual’s  baseline  perlormance  score  and  his/her  goal.  This  measure  ol  goal  difficulty  allowed  lor  the  partial  control  ol 
baseline  individual  diilerences  in  ability  and  motivation. 


RESULTS 


General  Results 


The  initial  analyses  examined  all  workers  independent  oi  their  baseline  performance. 

Manipulation  check.  In  order  to  verily  the  treatment  conditions,  workers  in  both  the  assigned  and  participative  groups  were 
asked  to  respond  on  a  4-point  Likert  scale  how  much  influence  they  had  in  setting  their  goals  (I  :  a  lot  ol  say;  4  s  no  say). 
Individualsjn  the  participative  condition  (X  =  1.3  reported  significantly  more  influence  (pd.01)  than  did  workers  in  the  assigned 
condition  (Xx.3.0) 

Perlormance  Change.  The  mean  perlormance  levels  lor  workers  in  the  treatment  and  comparison  groups  are  presented  in 
Table  1.  A  test  (or  homogeneity  ol  regression  coefficients  yielded  no  significant  differences  across  the  groups.  Therefore,  an 
analysis  ol  covariance  was  used  to  contrast  the  experimental  and  comparison  groups  with  the  baseline  perlormance  measure  as  a 
covaritte.  A  significant  main  eilect  was  lound  (p <  .01)  indicating  differences  in  treatment  performance  levels  across  the 
group!.  Follow-up  tests  indicated  that  both  the  assigned  goal  setting  group  (adjusted  X  •  108.3)  and  the  participative  group 
(adjusted  =  106.2)  were  signilicantly  higher  (p<  .05)  than  the  comparison  group  (adjusted  X  »  101.7).  There  were  no  signilicant 
differences  between  the  two  goal  setting  groups  during  the  baseline  or  treatment  periods. 
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Table  1 


Mean  and  Adjusted  Performance  Efficiency  Scores 


Mean  Performance  Efficiency 


Croup 

Baseline  (B) 
Period 

Test  (T) 
Period 

Performance 

Change 

(T-B) 

Adjusted* 

Test  Period 

N 

Comparison 

99.3 

102.* 

•2.9 

101.7  b,c 

117 

Experimental 

97.) 

106.* 

.*.9 

107.1  b 

12* 

Assigned  goals 

99.) 

101. 9 

.9.6 

10*. J  c 

37 

Participative  goals 

96.0 

10*.* 

.1.* 

106.2  C 

67 

‘Adjusted  to  control  for  differences  in  baseline  period  performance. 

'’For  the  analysis  contrasting  the  comparison  and  combined  experimental  groups,  covariance  F  *  11.7,  pe.OOl. 
cFor  the  analysis  contrasting  the  comparison,  assigned  goals,  and  participative  goals  groups,  covariance  F  *  (.2, 
p<  .01. 

Coal  difficult).  A  significant  correlation  was  found  between  objective  goal  difficulty  and  degree  of  performance 
improvement  (r  s  .iiO,  pc.001).  No  difference  was  found  between  the  average  level  of  goal  difficulty  in  the  participative  group 
(X  =  J.l)  and  the  assigned  group  (X  =  11.0).  This  finding  is  consistent  with  the  earlier  linding  indicating  no  difference  in 
performance  between  the  two  groups  in  the  treatment  period.  One  interesting  finding  did  emerge  as  to  the  actual  goals  set  in 
tne  two  groups.  Seventy-three  percent  of  the  participative  workers  had  a  goal  of  100  (or  standard  level  of  performance)  whereas 
only  S'*  of  the  assigned  workers  had  a  goal  of  exactly  100.  This  distribution  of  goals  at  or  different  than  100  across  the  two 
groups  was  statistically  significant  (Chi  Square  «  30.2,  g<.03l).  It  thus  appears  that  workers  who  had  some  influence  in  their 
choice  of  goals  preferred  a  goal  equal  to  existing  organizational  standards. 

High  Versus  Low  Performers 

Yorkers  in  the  experimental  and  comparison  groups  were  divided  into  two  categories  based  on  their  level  of  performance 
during  the  baseline  periodi  (1)  high  performers  were  .ndividuals  whose  average  performance  during  the  lS-week  baseline  period 
was  at  or  above  standard  (i.e.,  100),  and  (2)  low  performers  were  Individuals  whose  average  performance  was  below  standard. 

Performance  change.  The  mean  performance  levels  for  high  and  low  performers  by  different  treatment  groups  are  presented 
in  Table  2.  Two  repeated  measures  analyses  of  variance  were  performed--one  for  experimental  high  performers  and  one  for 
experimental  low  performers.  No  main  or  interaction  effects  were  found  for  high  performers  indicating  that  there  was  no 
performance  improvement  in  either  the  assigned  or  participative  conditions.  On  the  other  hand,  a  main  effect  for  time  period 
was  found  for  the  low  performers  (p<  .01),  indicating  that  low  performers  m  both  the  assigned  and  participative  conditions 
significantly  improved  their  performance  as  a  result  of  goal  setting. 

Table  2 

Mean  Performance  Efficiency  Scores  for 
High  and  Low  Performers 


Baseline  Test  Performance  Adjusted 

Croup  Period  Period  Change3  Test  Period  N 


High  Performers 


Experimental 

1!*.0 

11*.  3 

♦  *.  1 

117. 1  b 

37 

Comparison 

111.2 

111.1 

-  .1 

112.3  b 

32 

Low  Performers 

Experimental 

*3.3 

96.3 

.12.* 

9*.  3  C 

67 

Comparison 

90.1 

93.* 

.  3.3 

93.  3  c 

65 

*The  difference  between  the  test  period  and  baseline  period  performance  scores. 
bCovariance  F  x  2.S. 
cCovariance  F  x  6,3,  p<  .03. 

Because  regression  towards  the  mean  presented  a  potential  confounding  interpretation  for  the  improvements  with  the  low 
performers,  analyses  of  covariance  were  also  performed  on  these  data.  A  test  for  homogeneity  of  regression  coefficients 
revealed  no  significant  difference  among  high  and  low  performers  across  the  three  groups.  Thus,  an  analysis  of  covariance  was 
conducted  separately  for  high  and  low  performers  using  the  baseline  performance  measure  as  a  covariate.  The  results  were 
identical  to  those  reported  earlier.  High  performers  in  the  experimental  and  comparison  groups  did  not  differ  while  the  low 
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performers  in  the  experimental  group*  (adjusted  X  s  9t.})  were  significantly  higher  (p <  .05)  than  the  low  performers  in  the 
comparison  group  (ad)usted  X  »  93.3.  Overall,  the  result*  suggest  than  goal  setting  had  a  positive  impact  on  the  performance  of 
low  performer*  and  no  effect  on  high  performers. 

Coal  difficulty.  One  factor  that  could  explain  the  different  elfect*  of  goal  setting  on  low  and  high  performers  is  goal 
difficulty.  It  was  proposed  that  low  perfonners  would  set  (or  be  assigned)  more  difficult  goals  relative  to  their  baseline 
performance  level  than  would  high  performers.  The  results  relevant  to  this  hypothesis  are  given  in  Table  3.  The  mean  goal 
difficulty  level  for  all  the  low  performers  (t  3.2)  was  significantly  greater  (pc  .001)  than  the  mean  level  of  goal  difficulty  for  high 
performers  (-.a).  On  the  average,  high  performers  had  goals  that  were  slightly  lower  than  their  baseline  performance  level, 
whereas  low  performers  had  average  goals  that  were  approximately  10  points  above  their  baseline  performance  level. 

Table  3 


Mean  Coal  Difficulty,  Coal  Acceptance,  and  Performance 
Change  for  Experimental  High  and  Low  Performers 


Experimental 

Mean  Goal 

Mean  Performance 

Group 

Difficulty 

Change 

N 

High  Performers 


Assigned 

3.0 

♦  5.7 

27 

Participative 

-5.3 

♦  3.1 

30 

Total 

-  .ft 

♦ft.) 

37 

Lo*  Performers 

Assigned 

16.7 

♦  13.0 

30 

Participative 

13.0 

♦  12.6 

37 

Total 

13.2 

♦  12.2 

67 

Analyses  were  also  undertaken  to  compare  goal  difficulty  (or  assigned  and  participative  workers.  Results  indicated  that 
mean  goal  difficulty  was  significantly  higher  (t  =  2.3*,  p<.05)  for  high  performers  who  were  assigned  goals  (3.0)  than  for  high 
performers  who  participatively  set  goals.  Indeed,  high  performers  who  parlicipatively  set  goals  had  an  average  goal  that  was 
more  than  five  points  below  their  baseline  performance.  No  significant  difference  was  found  between  the  mean  goal  difficulty 
level  of  poor  performers  in  the  assigned  (16.7)  and  participative  (13.0)  conditions. 

Because  :he  results  indicated  that  a  number  of  workers  had  negative  goals  (e.g.,  goals  that  were  lower  than  their  baseline 
performance),  additional  analyses  were  performed  to  assess  the  relation  of  positive  and  negative  goals  to  performance  change. 
Results  indicated  that  31%  of  the  high  performers  (N  *  25)  had  goals  that  were  lower  than  their  baseline  performance  whereas 
only  3%  of  the  low  performers  (N  *  3)  had  negative  goals.  Seventy-five  percent  of  these  high  performers  with  negative  goals  (N 
=  21 )  were  in  the  participative  condition.  A  significant  positive  correlation  was  found  between  goal  difficulty  and  performance 
change  for  low  performers  with  positive  goals  (r  =  .*1,  pC  .001).  However,  no  significant  relationships  were  found  for  high 
performers  with  either  negative  or  positive  goals,  although  the  correlation  for  the  later  group  was  marginally  significant  (r  »  .27, 
p<  .1C’.  Also,  the  performance  change  scores  for  high  performers  with  positive  goal  (X  r  S.l)  were  higher  than  those  for  high 
performers  with  negative  goals  (X  =  .5),  although  this  difference  was  only  marginally  significant  (p<  .10).  These  findings  suggest 
that  goal  setting  was  somewhat  successful  for  high  performers,  but  only  if  they  had  goals  higher  than  their  baseline 
performance. 


DISCUSSION 

These  findings  provide  support  for  Locke's  (1962)  goal  setting  theory,  although  they  also  suggest  that  goal  setting 
ellectiveness  may  be  contingent  on  contextual  factors.  First,  there  was  a  positive  relation  between  goal  difficulty  and 
performance  improvement;  however,  this  relationship  only  held  for  workers  whose  goals  were  higher  than  their  baseline 
performance.  In  addition,  consistent  with  the  hypothesis  concerning  low  performers  and  an  earlier  study  by  Pritchard  et  al. 
(1951),  goal  setting  was  more  effective  with  low  performances  than  with  high  performers.  This  differential  impact  was 
reflected  both  in  terms  of  greater  goal  difficulty  and  more  performance  improvement. 

The  goal  setting  process  appeared  to  be  affected  by  the  NARF's  use  of  engineered  performance  standards.  This  is  supported 
by  the  lower  mean  goal  difficulty  level  for  high  performers  (relative  to  low  perfonners)  that  occurred  both  when  goals  were 
assigned  and  participatively  set.  Assigned  goal  setting  did  result  in  more  difficult  goals  for  high  performers  than  did 
participative  goal  setting;  however,  this  difference  was  not  reflected  in  significant  differences  in  the  degree  of  performance 
improvement  for  the  two  groups.  The  large  proportion  of  goals  that  were  participatively  set  at  100  (standard  performance  level) 
also  suggests  that  organizational  task  standards  can  influence  the  goal  setting  process.  Workers  may  have  felt  that  100%  was 
the  most  reasonable  goal  for  the  organization  to  expect  them  to  achieve— independent  of  their  baseline  performance.  With  the 
exception  of  these  findings,  participative  and  assigned  goal  settings  yielded  virtually  identical  results.  This  is  consistent  with  a 
large  number  of  lab  and  field  studies  (see  Locke  et  al.,  1921).  The  failure  to  find  more  dilficult  goals  set  in  the  participatively 
treatment  grbups  may  partially  reflect  the  role  of  context.  Where  workers  had  tome  influence  over  their  goals,  they  often  opted 
for  what  they  considered  to  be  fair  (l.e.,  standard)  rather  than  what  they  felt  would  be  challenging. 

Some  caveats  seem  in  order.  First,  the  sample  size  was  not  large,  especially  when  it  was  broken  down  into  subgroups. 
Second,  the  characteristics  of  the  work  force  may  have  played  an  important  role.  Navy  production  workers  have  more  job 
security  than  most  private  sector  industrial  employees.  Thus,  these  workers  may  have  felt  more  latitude  in  choosing  negative 
goals.  Finally,  goal  setting  effectiveness  was  only  assessed  over  3-1/2  months.  There  is  some  evidence  to  suggest  that  goal 
setting  effects  are  not  sustained  over  longer  time  periods  (see  Ivancevich,  1976). 
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Overall,  the  result!  o<  this  study  support  the  general  contention  oi  this  paper.  Coal  setting  is  an  effective  motivational 
technique  lor  Navy  production  workers  but  is  subject  to  contextual  constraints.  In  this  sense,  it  has  both  potential  utility  and 
potential  problems.  There  are  limitations  as  to  conditions  where  goal  setting  works  best  rod  for  whom  it  works  best  (ivancevich, 
1971).  There  is  a  need  to  follow  Latham  and  Yuki's  (197  3b)  suggestion  that  future  research  in  goal  setting  begin  developing  more 
of  a  contingency  framework.  Thu  study  was  one  step  in  that  direction. 
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Background 

In  many  organizations  goal  setting  has  been  found  to  be  a  powerful  technique  for  influencing  work  motivation  and 
performance  (Locke,  Feren,  McCaleb,  Shaw,  and  Denny,  1910).  While  the  positive  effects  of  goal  setting  on  worker  productivity 
have  been  demonstrated  repeatedly,  a  major  weakness  of  this  approach  is  the  failure  to  specify  the  process  by  which  goals  are 
set.  The  central  argument  of  this  paper  is  tljat  the  process  of  goal  choice  may  be  central  to  understanding  the  relationship 
among  organizational  context,  goal  setting,  motivation  and  performance. 

Willi  few  exceptions,  there  has  been  little  research  directed  toward  understanding  the  determinants  of  goal  choice, 
acceptance,  and  commitment  (Steers  &  Porter,  1979).  This  is  unfortunate  because  goal  setting  -joes  not  take  place  independently 
of  the  work  place;  reward  systems  and  other  work  setting  characteristics  come  together  to  affect  goal  choice,  acceptance,  and 
commitment  (Crawford,  1982).  Some  investigators,  however,  have  attempted  to  use  an  expectancy  theory  model  to  explain  goal 
choice  and  acceptance.  For  example,  in  both  laboratory  and  field  settings  goat  acceptance  has  been  reliably  predicted  using 
expectancy  and  valence  measures  (Dachler  &  Mobley,  1973;  Mento,  Cartledge,  it  Locke  1980;  Steers,  1973).  In  a  related  line  of 
research,  expectations  of  success  and  the  value  placed  on  the  outcomes  of  goal  attainment  were  found  to  be  the  principal 
determinants  of  "level  of  aspiration"  (Frank,  1991;  H.igard,  1992/1938).  These  two  factors  are  highly  related  to  the  core 
concepts  in  expectancy  theory:  expectancy  and  valance  (Vroom,  1969). 

In  addition  to  a  limited  understanding  of  how  people  set  goals,  the  relationship  between  goal  setting  and  other  motivational 
techniques  is  unclear.  This,  in  part,  may  be  due  to  the  fact  that  there  has  been  little  integration  of  goal  setting  with 
motivational  theories.  The  role  of  motivational  techniques,  such  as  monetary  incentives  and  goal  setting,  in  work  motivation  and 
performance  is  one  of  the  most  underresearched  and  poorly  understood  areas  in  organizational  behavior  (Opsahl  &  Dunnette, 
1966;  Lawler,  1981). 

The  need  for  a  better  understanding  of  the  process  of  goal  choice  is  evident.  This  process  may  provide  insight  into  the 
relationship  among  goal  setting,  organizational  context,  motivation,  and  performance.  Also,  this  process,  if  linked  to  motivation 
theory,  should  help  to  clarify  the  relationship  among  goal  setting  and  other  motivational  techniques,  such  as  monetary  incentives. 
The  purpose  of  this  study  is  to  offer  a  preliminary  model  of  goat  choice,  work  motivation  and  performance.  This  model  is 
presented  in  Tigure  I.  As  a  preliminary  test  of  its  validity,  selected  elements  in  the  model  will  be  examined  to  determine  its 
usefulness  as  an  explanatory  device. 
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Figure  I.  A  Model  of  Coal  Choice,  Work  Motivation,  and  Performance. 
Work  Motivation  Model 


Based  on  expectancy  theory  concepts  and  processes,  the  model  shown  in  Figure  1  explains  work  motivation  and  performance 
as  a  cognitive  process  where  an  individual  chooses,  from  alternative  performance  goal  levels,  the  level  perceived  to  be  most 
attractive.  This  perception  of  attractiveness  is  based  on  various  beliefs  and  feelings  a  person  has  regarding  the  likelihood  that 
performing  at  certain  levels  will  lead  to  particular  job  outcomes.  Contextual  factors,  such  as  the  opportunity  to  earn  monetary 
incentives  for  good  performance,  will  influence  these  beliefs  and  feelings.  The  hypothesized  effect  of  the  performance  goal  is  to 
influence  the  amount  of  effort  a  person  is  wiling  to  expend  in  accomplishing  the  goal.  Furthermore,  an  individual's  self- 
assessment  of  ability  as  well  as  actual  ability  are  presumed  to  moderate  the  relationships  among  performance  goal,  effort,  and 
performance.  Since  in  this  model  the  goal  concept  is  a  major  determinant  of  effort  and  performance,  it  is  crucial  to  understand 
how  people  choose  their  performance  goals. 


The  model  suggests  that  contextual  (actors  influence  valence,  an  individual's  anticipated  satisfaction  with  particular  levels 
ol  different  joh  outcomes,  and  instrumentality,  the  expectancy  that  different  performance  levels  are  associated  with  different 
outcomes.  Contextual  factors  also  affect  expectancy,  a  person's  belief  concerning  the  likelihood  of  achieving  a  particular  level 
of  performance  if  they  tried  their  best.  Valerv  e  and  instrumentality  combine  multiplica  lively  to  determine  performance 
valence.  Performance  valence  is  a  hypothetical  construct  llut  represents  tl»e  anticipated  satisfaction  of  performing  at  a  given 
level  of  performance.  The  anticipated  satisfaction  for  a  given  performance  level  is  derived  from  its  degree  of  association  with 
particular  job  outcomes  and  the  valence  of  those  job  outcomes  to  tl»e  individual.  Performance  valence  combines  inultipimxti  vely 
with  expectancy.  This  product  becomes  the  perception  ol  attractiveness  lor  each  performance  level.  In  essence,  each  possible 
level  of  task  performance  acquires  valence  through  its  association  with  certain  job  outcomes  and  the  anticipated  satisfaction 
associated  with  these  outcomes.  This  performance  valence  is  tlicn  modified  by  a  person's  oeliel  concerning  the  likelihood  of 
achieving  tliat  level  of  performance  given  his  or  l*er  best  effort.  The  result  is  a  perception  of  attractiveness  for  each  level  of 
p*u  formatter. 


The  motfel  specifies  that  goal  choice  is  based  on  a  person's  evaluation  of  the  relative  attractiveness  of  various  performance 
levels.  The  model  is  flexible  in  that  it  accommodates  alternative  decision  strategies  (e.g.,  return  on  effort,  maximization,  value 
matching).  In  this  study  the  return  on  effort  appr*»ach,  which  assumes  ;liat  people  use  an  incremental  decision  rule  in  choosing  a 
goal,  was  used  in  determining  the  goal  choice  prediction.  titli  this  approach  the  model  would  predict  goal  clwice  to  be  some 
measure  roller  ting  the  marginal  gam  in  the  attractiveness  ol  performance  for  performing  at  a  particular  level.  While  this 
approai  h  Jus  hern  successfully  employed  to  improve  expectancy  theory  predictions  ol  performance  (Kopelman,  1977),  empirical 
evideiKe  is  larking  ronr.erswng  tl»e  relative  arcura*  y  ol  alternative  decision  strategies.  Addition*!  work  is  needed  to  determine 
whetlier  tlie  return  on  effort  approach  offers  the  best  representation  of  the  goal  choice  process. 

Tl*e  remainder  of  the  model  describes  the  pro*  ess  by  win*  h  performance  goals  are  translated  into  work  motivation,  and  the 
work  motivation  into  task  performance.  The  hypothetical  relationship  of  these  concepts  and  the  process  by  which  goals  are 
tr  instated  into  performance  is  based  on  the  broad  theoretical  position  tliat  Performance  (P)  equals  the  product  of  Ability  (A)  and 
Motivation  (.V.);  (P  r  A  X  M). 

Though  cognitions  serve  telsc  purposes,  they  ate  influenced  by  past  behavior  and  experience.  The  mod*'!  specifies  feedback 
loops  suggesting  that  a  person's  effort-performance  expectancy  is  influenced  by  past  expenditures  ot  effort  and  performance. 
Mso,  past  performance  affects  both  a  person's  objective  ability  and  their  subjective  estimate  of  their  ability.  These  factors,  in 
turn,  affe*  t  future  effort  and  performance.  The  model  is  dynamic  m  that  the  source  of  purposive  action  is  cognitive  activity, 
though  not  necessarily  conscious,  that  is  influenced  by  past  action  and  its  consequences. 

Method 


Subjects 

One-hundred  and  thirty  experimental  subject'  par tiopjtcd  in  this  study.  Their  average  age  was  21  years.  Seventy-one  of 
the  suhjri  ts  were  female  and  59  were  male.  Approximately  <> 0 %  were  high  school  students  and  the  remaining  60%  were 
undergraduate  college  students.  Some  data  lor  six  subjects  were  missing  anJ  therefore  were  unavailable  lor  some  ol  tlie 
analyses. 

Procedure 

The  present  study  was  port  of  a  larger  work  simulation  study  designed  to  investigate  the  cllects  of  alternative  incentive 
magnitudes  on  per  lonnance  (Riedel,  Nebeker,  A  Cooper,  198  5). 

Sub)**!  ts  wore  recruited  for  part-time  employment  to  perform  a  cleric  ol  transfer  task.  The  1 10  subjects  who  qualified  for 
the  job  were  assigned  randomly  to  l  of  7  cxpcnmcot.il  conditions  differing  in  terms  of  the  magnitude  of  incentive  offered  lot 
various  levels  of  performance.  They  wo»ked  5  days,  U  horns  a  day,  for  a  total  ol  20  hours  at  a  rate  of  $<k40  per  hour. 

Research  questionnaires  were  administered  three  times:  alter  assignment  to  an  experimental  condition,  at  the  start  of  the 
third  day,  and  at  tlie  start  of  the  fifth  day.  These  questionnaires  contained  the  expectancy  and  goal  items  needed  for  evaluating 
tlie  model  predictions.  The  quality  and  quantity  ol  performance  was  recorded  daily.  A  detailed  description  of  the  experimental 
juo<  edure,  treatment  conditions,  construe  ts  and  measures,  and  method  ol  wage  and  incentive  payment  can  be  obtained  (rout  the 
author. 


Results 


Manipulation  Check 

Incentives  and  performance.  It  was  expected  that  subjects  in  the  incentive  conditions  would  perform  better  than  subjects  in 
the  nonmeentive  groups.  To  test  this  hypothesis,  an  analysis  of  variance  was  performed  with  treatment  condition  as  the 
dependent  variable. 

The  results  ol  this  analysis  suggest  a  significant  treatment  effect,  £(6,120)  r  3.27,  g<  .005.  A  planned  comparison  of  the 
jierlormance  means  lor  the  incentive  and  nonmeentive  groups  revealed  a  significant  difference,  Ul 20)  --  3.87,  g  <.001. 

Incentives  and  instrumentality.  Subjects  were  asked  tlie  amount  of  pay  they  expected  to  receive  if  they  were  to  perform  at 
alternative  levels  of  performance.  Judging  from  the  responses,  the  performance-pay  relationship  was  accurately  per'  eived  by 
most  subjects  n«  ross  the  treatment  groups.  For  all  conditions  the  reported  pay  instrumentalities  approximate  the  actual 
relationships  between  pay  and  performance. 

Model  Predictions 


The  central  research  question  pertained  to  the  capacity  of  the  model  to  account  for  the  process  of  goal  choice  and  task 
!>•  florin, nice,  lo  evaluate  the  model,  goal  choice  (level)  was  predu  ted  by  the  model,  using  a  return  on  effort  decision  algorithm. 
This  prediction  was  compared  with  self-reported  goal  choice.  Also,  tlie  ability  of  the  model  to  predict  performance  was 
evaluated  by  correlating  the  predicted  performance  with  actual  task  pc;  lonnance.  The  results  of  these  analyses  arc  summarized 
below,  first  for  goal  choice  and  then  task  performance. 

CjQ.it  i  hoi*  e.  Prior  lo  evaluating  tlie  goal  choice  predi*  tion,  resumes  to  the  xoll-ropnrt  goal  r  ho  ire  question  w'ere 
examined.  Twenty-six  subjects  selected  a  single  quuniitali geol  (c.g.,  5  units  per  hour),  39  subjects  selected  a  quantitative 
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po,»;  *,th  j  range  (f.g.t  S/  units  per  hwO,  and  ».0  vmj'TU  v*t  noo-<)u.vili(.i(ur  goah  (r.g.,  to  do  in)  best).  To  increase  lie 
sample  Mir  for  tin*  single  quantitative  goat  category,  angle  goals  for  subjr*  is  noth  quntitative  rangr*  goals  were  ro*npilffJ  by 
averaging  tlie  upper  and  lower  anchors  of  their  range  goal  response.  This  resulted  in  6  5  Cases  with  a  single  quantitative 
production  goal. 

Tin*  .n.vii  i  predi*  t ■  >«•  of  goat  «  ho»<*e  correlated  signifi*  anti*  with  ***l  I -reported  goil  rtioice  (r  -  .SI)  and  hsk  performance 
(r  .  v»%  Mgmfp  .ml  (j>  g  .GOD.  It  was  interest  to  determine  if  the  relationship  between  this  goal  measure  aid 

per farmaip- e  u.untaiu**d  for  the  entire  vvnplf,  in*  hiding  people  »itMit  quantitative  goats.  Tor  the  entire  sample  tlie 
<  arrelation  between  the  predicted  goal  choice  and  per  forma  rv  e  n  (r  ■  .54)  and  for  people  without  quantitative  goals  (f  *  .57), 
h>th  Sigmft.  an  (g  <  .00M.  Also,  the  results  of  a  t  test  indicate  no  s.gniftcant  m ean  difference  m  performance  between  those 
people  wh.*  set  piantit.iti/e  goals  and  those  people  *i»o  did  »vot  v*t  a  q*»  mtitativc  goal.  It  appears  Hut  the  goal  choice  prediction 
fr>n  the  resear*  li  model.  based  on  tt»e  return  on  effort  algorithm,  relates  highly  to  self-reported  goal  choice.  The  model  also 
provules  a  significant  prediction  of  performance,  regardless  of  whether  a  subject  reported  setting  «  quantitative  goal. 

IVrl.tint  in  <■.  The  •  ipa«  it*  of  the  model  to  predict  |er |iu<  i.iir  e  was  fvalusted  h*  '■pfifUtmj;  the  pr'*»,i»  ted  performance 
*i«h  ii  tuil  task  |e'!''f-njiH  <•.  T  f»e  mo  Jr  I  predi*  t«*m  >*|  j«erfor  iii.um  r  was  Sigmlif  Antl>  c*»r  related  with  a*  tual  performance  on  tin* 
t*\k  (r  -  .46,  p  <  .001J.  tl»lr  this  predu  lion  was  slight!*  hotter  for  subjects  who  set  q*iantit stwe  goals  (r  -  .55,  £  <  .001)  than 
sib.rMs  who  did  not  (r  . )9,  £  <  .001),  tlie  difference  between  these  correlations  was  not  significant,  fhese  findings  provide 
preliminary  support  for  the  validity  of  the  research  model  in  predicting  performance. 

Discussion 


i  hr  ei.tr  it  ptrjwv  »*«  this  stud*  was  to  impiu/c  our  understanding  »•  the  pro*  ess  of  goal  choice.  Over  ill  the  goal  Choi*  c 
pr  *  e*s  sjv*.  i lied  in  the  r«*sear«  It  model  was  supported  by  the  findings.  First,  monetar*  incentives  were  found  to  influence  pay 
lustra  nentality.  Second  the  cognitive  components  of  goal  choice  which  were  specified  in  the  research  model  predicted  self- 
reported  goals  and  performance,  suggesting  that  tlie  process  of  goal  choice  may  be  linked  to  expectancy  theory  concepts  and 
processes. 

l  ie  off#*.  t  >»f  incentives  on  pay  ins  Momenta  1 1 tv  mdn  ites  Hut  the  treatment  aifeited  individual  perceptions  about  the 
»  im  i*it  of  pas  iss  m  i.ited  with  alternative  levels  of  performance.  Results  sliow  that  the  pa*  instrument ilities  approximate  tlie 
actual  relationships  between  pay  and  per formanc  e,  indir  atmg  that  the  pay  runtmgenc ms  were  perceived  quite  accurately.  In 
terns  of  the  model,  the  effert  of  the  treatment  *  is  to  increase  instrumentality  and  thereb*  increase  performance  valence,  tlie 
mli  i,»it«  I  sthsfaMion  of  jK*r forming  at  ■*  given  Irvrl  of  performan*  e.  The  performary  r  valence  for  a  given  performance  level 
is  def j ved  from  its  degree  if  asso*  lation  with  parti  ul  ir  job  ^iti  ones  an**  the  volen*  e  of  those  job  outcomes  to  the  individual.  It 
'  in  he  *  oncluded  that  the  experimental  treatment  was  ver*  effective  in  strengthening  this  asso*  Mtion. 

The  goal  '  hoi.  e  pro*  ess  specified  in  the  rcscwli  model  was  supported  hv  tne  findings,  suggesting  that  expectancy  tlieory 
.  iiv  epts  ma*  he  useful  in  understanding  the  cognitive  «  ompouents  of  goal  *  >k>kc.  The  •  ombmation  of  tbs*  expectancy  constructs 
prodtH  el  t  re  (suitably  i<  c  irate  prediction  of  goal  <  hoice.  The  predi*  ted  goal  choice  was  sigm fi»  antiy  correlated  with  the  a*  tual 
‘If  r1  ported  g  *  il.  This  fmdi.ig  suggests  ttie  interpretation  (fiat  goal  <  hoi*'e  is  a  cognitive  process  where  an  individual  chooses, 
from  iltermtwe  performant  e  godl  levels,  the  level  per*  eivcd  to  he  most  attractive.  This  perception  of  attractiveness  is  based 
an  various  beliefs  and  feelings  a  person  has  regarding  the  likelihood  that  performing  at  certain  levels  will  lead  to  particular  job 
out' o  nes.  The  results  indicate  that  contextual  fas  tors,  in  tins  cave  the  opportunity  to  earn  monetary  incentives  for  good 
p«*rlorman<  r,  influence  these  beliefs  and  feelings. 

I  his  study  his  contributed  to  a  better  undemanding  of  the  relationship  between  organizational  context  and  goal  setting  as 
they  relate  to  work  motivation  and  perforinaiue.  Hie  findings  suggest  that  the  process  of  goal  choice  is  central  to  understanding 
Inw  rontextml  variables  influence  goals,  motivation,  and  per formanec.  Moreover,  the  researr  h  model  provides  a  useful  starting 
point  for  inve. tigatmg  the  relationships  between  organizational  context  and  employee  cognitions  dnd  perhaps  for  integrating  goal 
setting  with  e x jiec t.i nr y  theory. 
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THE  EFFECTS  OF  REWARD  MAGNITUDE  AND  DIFFICULTY  OF  PERFORMANCE 
STANDARDS  UPON  INDIVIDUAL  PRODUCTIVITY 

Delbert  M.  Nebeker1 

Navy  Personnel  Research  and  Development  Center 
San  Diego,  CA  92152-6800 

Management  systems  that  control  productivity  and  performance 
are  critical  to  the  success  of  organizations.  One  form  of 
control  system  known  as  a  Performance  Contingent  Reward  System 
(PCRS) ,  is  receiving  increasing  attention  in  the  U.S., 
particularly  when  it  involves  financial  incentives.  This  is 
partly  a  function  of  the  critical  productivity  problem  we  face  in 
this  country,  and  partly  because  recent  evidence  shows  financial 
incentives  to  have  a  strong  positive  impact  on  performance  (e.g. 
Nebeker,  Neuberger,  1985;  Locke,  Feren,  McCaleb,  Shaw  &  Denny, 
1980) .  In  spite  of  this  evidence  the  use  of  financial  incentives 
as  a  means  to  increase  productivity  remains  a  controversial  issue 
(Belcher,  1974;  Lawler,  1983).  If  financial  incentives  are  to  be 
used  effectively  as  a  means  to  improve  worker  efficiency,  we  must 
have  a  better  understanding  of  how  they  operate.  We  need  to 
know  how’  reward  systems  should  be  designed  to  maximize  their 
value. 

The  design  of  reward  systems  can  vary  along  a  number  of 
different  dimensions.  Theses  include  the  following: 

1.  Objectivity  of  performance  measure.  The  degree  to  which 
the  performance  measure  is  measured  objectively  as  opposed 
to  subjectively. 

2.  Performance  aggregation  level.  The  number  of  people 
include  in  the  performance  measure  who  share  a  reward. 

3.  Performance  standard.  The  difficulty  of  the  performance 
level  required  to  earn  a  reward. 

4.  Sharing  rate.  The  percent  of  earnings  "saved”  by 
performing  above  standard  that  is  given  as  a  reward. 

5.  Performance  period.  The  length  of  time  that  performance 
data  is  accumulated  before  a  reward  determination  is  made. 

6.  Feedback  tvoe.  The  method  of  providing  performance 
feedback. 

7.  Feedback  period.  The  length  of  time  between  performance 
feedback. 


1The  views  expressed  in  this  paper  are  those  of  the  author  and 
are  not  official  and  are  not  necessarily  those  of  the  Department 
of  the  Navy 
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8.  Performance-reward  function.  The  shape  of  the  function 

relating  reward  to  performance. 

9.  Incentive  period.  The  length  of  time  following  the 

performance  period  before  payment  is  made. 

Little  is  known  about  the  optimal  values  for  these 
parameters  in  various  work  situations.  Virtually  no  empirical 
research  directly  addresses  these  issues  in  work  environments. 
Theory  is  only  slightly  more  helpful.  If  we  were  going  to  get 
answers  about  the  optimal  values  for  these  parameters  we  were 
going  to  have  conduct  some  parametric  studies.  Our  field 
experience  had  suggested  that  two  of  the  more  important 
parameters  in  the  design  of  reward  systems  were  standard 
difficulty  and  reward  magnitude. 

In  reviewing  the  literature  we  found  the  following:  There 

is  disagreement  over  whether  performance  standards  ought  to  be 
made  difficult  to  reach  (Locke,  et  al.,  1981,  Barnes,  1980)  or 
attainable  for  most  workers  and  therefore  easy  for  many  (Peters  & 
Waterman,  1982,  Motowidlo,  et  al.,  1978  ).  Evidence  for  both 
points  of  view  can  be  cited.  One  possible  reason  for  the 
apparent  contradiction  is  that  the  effects  of  the  reward 
magnitude  have  not  been  adequately  considered  in  research  on 
standards  or  goal  difficulty.  It  is  quite  likely  then,  that  the 
affects  of  easy  or  difficult  performance  standards  are  moderated 
by  the  magnitude  and/or  the  attractiveness  of  the  rewards 
available  for  reaching  and  exceeding  these  standards  (Matsui, 
Okada  &  Mizuguchi,  1981). 

The  instrumental  learning  and  conditioning  literature  (c.f. 
Logan,  1970 ,p. 90-91)  posits  that  increasing  magnitudes  of  reward 
have  "diminishing  returns"  on  performance  (at  least  for  rats) . 
This  suggests  that  very  large  rewards  are  likely  to  have  a  lower 
marginal  utility  than  moderate  rewards.  Some  of  our  own 
preliminary  research  supports  this  contention  with  people  at 
work.  Practical  applications  of  reward  systems  in  real 
organizations,  however,  show  wide  variability  in  the  amounts  of 
reward  offered  for  performance  above  standard.  Examples  in 
business  and  industry  can  be  found  with  sharing  rates  ranging 
from  10%  to  over  100%.  It  is  reasonable  to  assume  then  that 
systems  designed  to  pay  very  large  rewards  are  likely  to  produce 
marginally  less  improvement  and  be  less  cost  effective  than 
systems  that  pay  more  moderate  amounts. 

The  present  research  was  designed  to  help  us  understand  the 
interactive  effects  of  these  two  variables  by  exploring  the  joint 
effects  on  productivity  of  varying  degrees  of  reward  magnitude, 
as  defined  by  sharing  rate,  and  standard  difficulty. 

Furthermore,  it  is  expected  that  worker  ability  will  affect  the 
relationship  between  standard  difficulty  and  reward  magnitude. 
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Method 


Research  setting 

This  research  was  conducted  in  the  Organizational  Systems 
Simulation  Lab  (OSSLAB)  at  the  Navy  Personnel  Research  and 
Development  Center.  The  OSSLAB  is  designed  to  create  a  high 
fidelity  simulation  of  real  computerized  work  environments. 
Microcomputers  are  used  as  workstations  so  that  individuals  can 
be  hired  to  do  real  work  tasks  under  experimentally  controlled 
conditions. 

Subjects  and  design 

Twenty-four  employees  (8  males, 16  females)  were  recruited 
and  hired  (at  $4.89  per  hour)  to  provide  technical  support  to  the 
Navy  Personnel  Research  and  Development  Center.  Their  job  was  to 
enter  and  maintain  references  in  a  data  base  for  searching  and 
retrieving  the  scientific  literature.  The  Ss  were  required  to  be 
keyboard  proficient  before  being  hired.  They  worked  two  4-hour 
shifts  a  week  for  eight  weeks.  A  Work  sample  test  given  to 
measure  ability  revealed  no  significance  differences  between  the 
two  shifts. 

The  research  design  called  for  the  Ss  to  perform  their  work 
under  three  reward  conditions:  (1)  Baseline  or  control;  (2)  small 
incentives,  wherein  15%  of  the  wages  saved  by  performing  above 
standard  were  paid  to  the  employee  as  a  bonus  and;  (3)  large 
incentives,  wherein  50%  of  the  wages  saved  by  performing  above 
standard  were  paid  to  the  employee  as  a  bonus.  Furthermore,  the 
shifts  were  designated  as  either  the  easy  standard  group  or  the 
difficult  standard  group.  The  easy  standard  group  had  a 
performance  goal  or  standard  set  at  the  20th  percentile  of  the 
group's  baseline  performance  level.  This  meant  that  80  percent 
of  the  group  was  already  exceeding  the  standard  when  the 
incentives  were  introduced.  The  difficult  standard  group  had 
their  performance  goal,  or  standard,  set  at  the  90th  percentile. 
This  meant  that  only  10  percent  of  the  group  were  exceeding  the 
standard  when  the  first  incentive  was  introduced.  These  values 
were  chosen  to  match  two  interesting  findings.  *  Barnes  (1980) 
demonstrates  that  standards  set  by  usual  industrial  engineering 
methods  typically  produce  standards  that  only  10%  of  the  people 
exceed  without  performance  standards  and  feedback  or  incentives. 
The  difficult  standard  shift  was  designed  to  match  this 
condition.  The  easy  standard  was  chosen  to  match  the  finding 
that  80%  of  workers  believe  they  are  performing  above  average. 
Thus  the  standard  would  be  consistent  with  there  own  self- 
concept. 

The  final  design  was  a  2  X  3  factorial  design  where  standard 
difficulty  level  was  a  between-subject  factor  and  incentive  level 
was  a  within-subject  factor. 

Procedure 

On  the  first  day  of  work  the  Ss  reported  to  the  job  site  and 
were  welcomed  by  their  first  and  second  level  supervisors.  They 
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were  given  an  overview  of  the  work  they  would  be  doing  with  an 
emphasis  being  placed  upon  the  value  of  their  job  and  how 
important  accuracy  and  quality  were  to  the  information  system  a 
long  with  speed. 

They  were  then  trained  on  the  use  of  their  work  stations, 
(IBM  PC-XT  microcomputers).  This  included  the  use  of  both 
hardware  and  software.  Upon  completion  of  this  training  their 
shift  was  completed  for  the  day  and  they  were  excused  to  leave. 

On  the  second  day  they  were  given  additional  training  on  the 
task,  allowed  several  practice  items  and  then  given  the  work 
sample  test.  Following  the  work  sample  they  began  their  assigned 
work  of  entering  and  maintaining  the  references  in  the  data  base. 
The  remainder  of  their  employment  consisted  of  their  performing 
the  task  each  workday.  The  only  variations  in  this  schedule 
were:  (1)  the  introduction  of  the  small  incentive  during  the 
third  week  and  the  introduction  of  the  large  incentive  during  the 
sixth  week;  and  (2)  the  administration  of  work  perception 
questionnaires  on  four  occasions  spread  throughout  their  eight 
weeks  of  employment;  (3)  the  readministration  of  the  work  sample 
test  at  the  end  of  the  second  week. 

Performance  was  measured  by  keystroke  rate,  "the  number  of 
keystrokes  per  hour.  At  any  time  during  the  experiment  the 
workers  could  chose  to  view,  on  their  screens,  one  of  several 
reports  of  their  current  and  "to-dateM  performance.  During  the 
baseline/control  condition,  prior  to  the  introduction  of  the 
incentives,  these  reports  included  only  raw  performance 
information  such  as  keystroke  rate,  hours  on  the  tasks  and 
regular  pay.  Following  the  introduction  of  the  incentives, 
however,  the  reports  added:  (1)  A  listing  of  standards  and  the 
workers  current  and  to-date  performance  efficiency  against  these 
standards  (e.g.  keystrokes  per  hour/standard  keystrokes  per 
hour) ;  (2)  the  current  and  to-date  bonus  earned  for  exceeding  the 
performance  standards;  (3)  current  and  to-date  total  earnings 

Results  and  Discussion 

Daily  performance  means  were  used  as  the  chief  dependent 
variable  in  a  series  of  moderated  multiple-regression  equations. 
In  these  equations  the  ability  score  was  entered  first,  as  a 
subject-covariate  factor;  then  the  the  dummy  coded  treatment  main 
effects;  followed  by  the  two-way  interactions  and;  finally  the 
three-way  interactions. 

The  results  of  these  analyses  reveal  a  highly  significant 
multiple  R  at  each  step  and  an  overall  R=.89  (R2=.79;  p<.001). 

In  this  regression  three  significant  predictors  of  keystroke  rate 
were  found.  Ability,  as  measured  by  the  work  sample  test, 
accounted  for  a  large  portion  of  the  performance  variance 
(Beta=. 583 ;F=42 . 42 ;df=ll/202 ;  p<.001).  Also  contributing 
significant  portions  of  the  variance  to  the  prediction  of 
performance  were  the  interaction  of  standard  difficulty  with  the 
large  incentive  manipulation,  (Beta=l .213;F=8.21;df=ll/202: 
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g< . 005)  and  a  three-way  interaction  between  ability,  standard 
difficulty,  and  large  incentives  (Beta*=-1. 274  ;£«9. 928  ,*££*11/202; 
g< . 005) .  The  relationships  produced  from  these  effects  are  shown 
in  the  figure  below. 


As  can  be  seen  the  independent  variables  and  their 
interactions  produce  some  interesting  effects  on  keystroke  rate. 
First,  it  is  obvious  that  there  are  large  ability  differences  in 
performance  and  interaction  effects  for  incentive  level,  standard 
and  ability.  High  ability  workers  perform  significantly  better 
in  all  conditions  than  low  ability  workers.  Second,  when 
performance  standards  are  relatively  easy,  increases  in 
incentives  result  in  performance  increases  for  all  ability  levels 
at  each  level  of  incentive.  When  the  standards  are  difficult, 
however, _  performance  for_the_high  ability  workers,  first 
increases  for  the  small  incentives  and  then  decreases,  when  the 
large  incentives  are  introduced.  Finally,  with  small  incentives 
and  the  difficult  standards,  the  low  ability  workers  perform 
better  than  their  counterparts  with  easy  standards.  With  the 
introduction  of  the  large  incentives,  however,  their  performance, 
does  not  improve,  while  the  easy  standard  group's  does,  allowing 
the  easy  standard  group  to  out-  perform  the  difficult  standard 
group . 

From  these  results  the  following  points  appear  clear:  (1) 
Performance  improves  substantially  with  the  introduction  of  both 
standards  and  incentives  regardless  of  whether  or  not  the 
standards  are  difficult  or  easy  and  whether  or  not  the  incentives 
are  large  or  small.  (2)  It  appears  that  the  amount  of 
improvement  with  easy  or  difficult  standards  depends  upon  whether 
or  not  small  or  large  incentives  are  being  offered  for  exceeding 
them.  The  best  performance,  contrary  to  Locke  et  al.'s  position 
(1981),  occurs  when  the  standards  are  at  the  20th  percentile  and 
the  sharing  rate  is  50%.  (3)  High  ability  worker's  behavior 
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under  conditions  of  both  high  standards  and  large  incentives  seem 
to  be  either  discouraged  by  the  opportunity  to  earn  incentives, 
or  discount  the  value  of  increasing  performance  and  actually 
reduce  their  performance.  Either  of  these  explanations  requires 
a  view  of  small  incentives  as  providing  only  an  achievement  or 
competence  motive  for  reaching  the  goal.  When  large  incentives 
are  introduced,  however,  financial  motives  for  reaching  the  goal 
may  become  dominant.  Under  these  conditions  the  question  of 
discouragement  and/or  relative  value  may  be  significant  factors 
affecting  performance.  Which  of  these  explanations  (or  some 
other  possibility)  is  superior  remains  for  future  research  to 
determine. 
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External  Evaluation 
System  Development 
in  the  Canadian  Forces 

Commander  R.H.  Kerr  CF 
and 

Mr.  Duane  Tyerman 


Background 

The  External  Evaluation  Process  as  part  of  the  ISO  (Instructional 
Systems  Development)  model  is  under  active  study  within  several  commands  in 
the  CF  (Canadian  Forces).  One  such  study,  which  is  still  ongoing,  was  des¬ 
cribed  in  MTA  Proceedings  (Kerr  et  al ,  1984);  this  short  addendum  reports 
results  and  analysis  of  the  findings  in  the  studies  involving  79  Dental 
Technicians  (graduates  and  supervisors)  and  467  Administrative  Clerks  (grad¬ 
uates  and  supervisors).  Action  proposed  and  underway  is  discussed  in  the 
last  portion  of  this  report. 

System  Description 


The  system  employed  a  mail -out  questionnaire  approach  (individuals 
also  responded  by  mail).  Question  formats  required  the  graduates  and 
supervisors  to  assess  task  completion  in  accordance  with  a  question  grid 
providing  for  eight  possible  response  patterns  per  task.  Both  trades  were 
assessed  on  40-50  tasks.  Response  sheets  were  manually  input  into  computer 
files  and  subsequently  edited  and  analysed  using  special  programmes  designed 
specifically  to  be  'user  digestible'.  Details  of  the  formats  and  outputs 
are  described  in  the  previous  paper. 

Results  -  Input  Analysis 

Ninety-eight  percent  of  the  questionnaires  were  returned  and  42%  of 
the  response  sheets  contained  recording  errors.  Further  analysis  resulted 
in  only  18%  of  task  responses  being  spoiled  out  of  a  total  of  43,680.  Al¬ 
though  an  18%  spoilage  rate  is  considered  acceptable  by  the  authors,  revi¬ 
sion  to  the  input  formats  have  been  made  and  considerable  improvement  is 
expected  in  reducing  recording  errors.  A  contract  has  been  awarded  to  a 
research  firm  to  utilize  optical  mark  reader  equipment  incorporating  the 
editing  function.  Because  of  the  relative  uniqueness  of  the  input  format, 
and  the  multi-track  aspects  of  response  patterns  possible  in  such  a  paper- 
based  system,  no  other  option  appears  available  in  a  'mail-out'  setting. 

The  majority  of  personnel  responding  found  the  format  easy  or  very  easy  to 
complete  and  spent  a  mean  time  of  32  minutes  completing  the  questionnaire. 

As  indicated  in  the  previous  paper,  test-retest  reliability  was 
attempted  with  a  sample  of  60  of  the  original  population,  by  a  readministra¬ 
tion  of  the  same  instrument  one  month  later.  After  editing,  74%  of  respons¬ 
es  were  identical  to  the  original  administration  of  those  sampled.  Because 
of  the  multi-track  aspects  of  response  possibilities  and  because  some  res¬ 
ponse  change  is  expected  (e.g.  personnel  may  now  have  performed  a  task  not 
performed  at  the  first  administration)  this  response  is  taken  to  be  reason¬ 
ably  reliable,  and  with  form  revision,  line  correlations  should  improve. 


Processing  and  Outputs 


Processing  was  described  in  the  previous  paper  and  outputs  were  gener¬ 
ated  and  provided  to  Training  Managers.  The  External  Evaluation  business 
could  provide  some  earth-shaking  insights  into  training  conti nua  but  not  in 
these  instances!  Cases  involving  performance  deficiencies  beyond  the 
limited  level  were  virtually  non-existent  and  those  tasks  reported  as  being 
not  required  were  easily  explained  away  as  being  infrequently  performed.  A 
good  example  was  the  completion  of  a  casualty  form  by  a  junior  administrat¬ 
ive  clerk.  Infrequently  completed  tasks  such  as  these  are  being  examined 
for  the  degree  of  emphasis  that  these  tasks  have  on  course  lengths.  Further 
action  may  be  taken  to  alter  job  specifications  which  may  eventually  effect 
the  level  at  which  the  training  is  conducted.  The  authors  are  convinced 
that  their  next  populations  should  be  chosen  to  expose  and  quantify  areas 
where  some  more  serious  problems  are  known  to  exist,  which  would  enable  more 
components  in  training  conti nua  to  be  examined. 

Training  Managers  on  the  whole  were  pleased  with  and  understood  the 
outputs.  The  emphasis  on  job-orientation  utilized  in  this  system  took  much 
explaining,  mainly  because  the  audience  for  outputs  were  training  managers, 
not  performance  technologists  as  discussed  in  the  previous  paper. 

Further  Analysis  and  Direction 

The  question  of  external  evaluation  in  the  CF  has  received  renewed 
interest  in  1985  at  National  Defence  Headquarters  and  other  Commands  have 
instituted  their  own  programmes  and  studies.  The  authors  believe  that  Ex¬ 
ternal  Evaluation  as  applied  to  Pilot  Training  may  demand  different  emphasis 
than  that  required  for  Naval  Technician  training  in  that  output  analyses 
have  different  descriptors  in  order  for  all evi able  action  to  take  place  in 
correcting  performance  deficiencies.  The  major  demand  appears  to  be  in 
developing  a  system  which  will  monitor  and  correct  'overtraining'  in  order 
to  optimize  training  system  efficiency.  Apart  from  additional  questions 
!  being  asked  in  the  field  using  an  existing  External  Evaluation  system  to 

j  'flag'  a  possible  overtraining  area,  the  authors  at  this  stage  feel  that  the 

internal  evaluation  and  professional  design  processes  would  provide  more 
impact  in  guaging  and  correcting  overtraining.  Experimental  designs  could 
then  be  utilized  to  verify,  with  the  assistance  of  an  External  Evaluation 
System  whether  differing  training  methods  or  standards  result  in  acceptable 
field  performance.  The  authors  have  been  tasked  to  examine  this  area  in 
1986. 

Conclusion 

|  This  brief  resume  of  proceedings  is  intended  to  provide  interim 

J  results  on  the  development  of  an  External  Evaluation  system  within  a  CF 

;  context.  Future  developments  using  improved  technological  advances  such  as 

i  optical  scanning  techniques  and  use  of  video  display  terminals  as  inputs  to 

I  external  evaluation  systems  harbinge  a  heightened  future  for  this  often 

neglected  process. 
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Occupational  Learning  Difficulty: 

A  Construct  Validation  Against  Training  Criteria 
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Occupational  learning  difficulty  is  defined  as  the  time  required  to 
learn  to  satisfactorily  perform  occupational  tasks.  It  is  expressed  in 
terms  of  a  quantitative  index  which  is  produced  on  the  basis  of  information 
obtained  from  structured  job  analysis  questionnaires  developed  and 
administered  by  the  United  States  Air  Force  Occupational  Measurement 
Center.  For  any  given  job  specialty,  a  learning  difficulty  (LD)  index  is 
derived  by  combining  relative  ratings  of  task  learning  difficulty  obtained 
from  senior-level  technicians  and  benchmark  ratings  of  task  learning 
difficulty  obtained  from  external  occupational  experts.  The  adjusted  task 
ratings  resulting  from  this  combination  are  then  weighted  and  aggregated  to 
produce  an  occupational-level  index  of  learning  difficulty  which  can  be 
meaningfully  compared  across  job  specialties.  A  more  detailed  description 
of  the  derivation  procedure  has  been  provided  elsewhere  (Weeks,  1984). 

Once  an  LD  index  was  available  for  most  Air  Force  enlisted  job 
specialties  and  its  reliability  and  validity  had  been  demonstrated,  it 
served  as  a  job-centered,  frame-of-reference  for  various  management 
decisions.  In  this  context,  '•here  are  both  primary  and  special  apolications 
of  learning  difficulty  information.  Primary  applications  involve  both 
personnel  and  training  management. 

For  example,  to  the  extent  practical,  the  order  of  job  aptitude 
requirement  minimums  are  established  so  as  to  correspond  to  the  order  of  job 
specialties  in  terms  of  learning  difficulty.  This  application  contributes 
to  the  optimal  allocation  of  talent  by  ensuring  that  job  specialties  which 
are  highest  in  learning  difficulty  are  manned  by  enlistees  having  the 
highest  aptitudes. 

Learning  difficulty  indexes  are  also  applied  during  the  initial 
job-offer  process.  Air  Force  enlistees  are  assigned  to  job  specialties  at 
military  entry  processing  stations  on  the  basis  of  a  computer-based, 
person-job  match  algorithm  known  as  PROMIS  (Hendrix,  Ward,  Pina,  &  Haney, 
1979).  One  of  the  policies  implemented  by  PROMIS  is  to  offer  the  most 
difficult  job  specialties  to  the  most  talented  enlistees.  This  process 
obviously  requires  information  concerning  enlistee  aptitudes  and  information 
concerning  job  difficulty.  Within  the  PROMIS  system,  job  difficulty  is 
defined  in  terms  of  the  LD  index. 

Another  primary  application  involves  decisions  concerning  mode  of 
training.  Enlisted  job  specialties  are  designated  as  either  category  A,  B, 
or  C  skills.  For  category  A  skills,  the  mode  of  training  is  formal, 


resident  or  school-house  training.  For  category  C  skills,  the  mode  of 
training  is  on-the-job  training  (OJT).  For  category  B  skills,  the  mode  of 
training  can  be  either  formal  resident  training  or  OJT.  The  LD  index  is  one 
of  several  inputs  to  the  decision  process  associated  with  determining  mode 
of  training. 

In  addition  to  these  primary  applications,  there  have  been  noteworthy 
special  applications.  The  LD  index  was  used  as  an  empirical  basis  for 
justifying  Air  Force  personnel  quality  standards  in  response  to  inquiries  by 
the  House  and  Senate  Armed  Services  Committees  during  the  development  of  the 
1985  Defense  Appropriations  Bill.  Furthermore,  it  has  been  advanced  as  an 
empirical  basis  for  Air  Force  job  and  training  requirements  during 
investigations  by  oversight  committees  such  as  the  Government  Accounting 
Office  and  the  Air  Force  Audit  Agency. 

Problem 

Because  of  the  importance  of  such  applications,  the  validity  of  the  LD 
index  is  an  issue  of  considerable  interest.  Burtch,  Wissman,  and  Lipscomb 
(1980)  were  the  first  to  seriously  address  this  problem.  Their  approach 
consisted  of  correlating  relative  ratings  of  task  learning  difficulty  by 
senior-level  technicians  with  benchmark  ratings  by  occupational  experts. 

Observed  correlations  ranged  from  .54  to  .96  for  100  different  job 
specialties.  These  results  provided  evidence  of  the  convergent  validity  of 
the  benchmark  ratings.  Although  this  study  was  comprehensive  in  that 
separate  analyses  were  conducted  for  several  different  job  specialties,  it 
cannot  be  considered  sufficient  by  itself.  Validation  efforts  must  take 
into  account  the  functional  role  of  learning  difficulty  information  in 
management  applications.  For  all  the  applications  previously  described,  the 
occupational-level  index  rather  than  task-level  ratings  of  learning 
difficulty  served  as  the  referent  for  management  decisions.  Consequently, 
there  appears  to  be  a  need  to  evaluate  the  validity  of  the 
occupational-level  index  of  learning  difficulty. 

Method 

Because  the  LD  index  is  not  applied  to  predict  some  criterion,  construct 
validation  is  considered  to  be  more  appropriate  than  predictive  validation. 

With  construct  validation,  the  goal  is  to  evaluate  the  intrinsic  meaning  of 
some  measure  of  interest.  This  is  accomplished  by  evaluating  both 
convergent  and  discriminant  validity  (Campbell  &  Fiske,  1959).  Because  the 
LD  index  is  a  measure  of  learning  time,  strong  relationships  with  the 
training  time  for  an  initial-skills  course  associated  with  a  specialty  would 
be  expected.  Also,  because  the  LD  index  is  a  measure  of  a  job  property, 
strong  relationships  with  measures  of  personnel  attributes  would  not  be 

expected.  jj 

In  an  independent  research  project  devoted  to  the  development  of  a  ? 

covariance-structure  model  of  Air  Force  technical  training,  it  was  necessary  | 

to  collect  information  concerning  occupational  learning  difficulty  as  well  I 

as  measures  of  student  input,  course  content,  and  training  outcome  variables  j 

for  several  initial-skills  courses.  Therefore,  it  was  decided  to  extend  | 

that  effort  to  include  an  examination  of  the  construct  validity  of  the  LD  j 

index.  Only  analyses  relevant  to  the  construct  validation  of  the  LD  index  j 


i 

i 

i 

I 
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will  be  discussed  here.  Details  concerning  the  development  of  the 
covariance-structure  model  of  initial-skills  training  are  provided  by 
Mumford,  Weeks,  Harding,  and  Fleishman  (1985). 

Samples  of  subjects  and  courses  used  for  the  present  analyses  were 
identical  to  those  employed  by  Mumford  et  al.  (1985).  Courses  were  selected 
from  among  approximately  200  initial-skills  courses  administered  by  various 
technical  training  centers  under  Air  Training  Command.  Courses  were 
selected  to  provide  a  representative  sample  with  respect  to  numerous 
criteria  including  course  content,  student  flow,  training  costs,  and 
aptitude  area.  Subjects  were  sampled  so  as  to  provide  a  minimum  of  50 
students  for  each  course.  These  procedures  provided  a  total  of  5,970 
students  and  48  initial-skills  courses. 

The  independent  variable  examined  in  the  present  study  consisted  of  the 
LD  index  obtained  for  the  job  specialty  associated  with  each  of  the  48 
initial-skills  courses.  For  each  specialty,  the  index  consisted  of  an 
aggregate  value  obtained  by  deriving  the  average  learning  difficulty  of 
first-term  positions  in  the  specialty.  This  procedure  was  considered 
appropriate  because  initial-skills  courses  are  designed  to  provide  training 
for  tasks  likely  to  be  encountered  during  the  first  term  of  service. 

A  wide  variety  of  dependent  variables  falling  into  three  broad 
categories  were  examined.  The  dependent  variables  included  those  which  were 
expected  to  be  highly  related  to  the  LD  index  as  well  as  those  which  were 
not.  For  dependent  variables  in  each  category,  detailed  descriptions  of  the 
source  of  data  and  measurement  process  are  provided  by  Mumford  et  al.  (1985). 

The  first  category  of  dependent  variables  consisted  of  measures  of 
student  inputs.  These  measures  included  (1)  students'  average  scores  on  the 
composite  of  the  Armed  Services  Vocational  Aptitude  Battery  which  serves  as 
the  basis  of  the  aptitude  requirement  for  entry  into  the  course,  (2)  average 
reading  grade  level  as  measured  by  the  Air  Force  Reading  Abilities  Test,  (3) 
the  average  academic  motivation  of  students  in  the  course  as  indexed  by  the 
average  number  of  difficult  high  school  courses  completed,  (4)  average 
educational  level  of  students  in  the  course,  (5)  educational  preparation  as 
indexed  by  the  average  number  of  recommended  high  school  course 
prerequisites  completed  by  students  in  the  course,  and  (6)  the  average  age 
of  students  in  the  course. 

The  second  category  of  dependent  variables  consisted  of  measures  which 
represent  outcomes  of  training.  These  measures  included  (1)  average  final 
course  grades  for  students  in  the  course  as  indexed  by  the  average  score  on 
end-of-block  written  tests,  (2)  average  number  of  hours  of  special 
individualized  assistance  (SIA)  provided  students  by  course  instructors,  (3) 
the  number  of  academic  counseling  sessions  provided  students,  (4)  the  number 
of  nonacademic  counseling  sessions,  (5)  washback  time  as  indexed  by  the 
average  number  of  retraining  hours  provided  students,  and  (6,  7)  the 
academic  and  nonacademic  student  attrition  rates  for  the  course. 

The  third  category  of  dependent  variables  consisted  of  measures  of 
properties  of  the  initial-skills  course  and  are  described  as  course  content 
variables.  This  category  included  (1)  course  length  (in  hours),  (2)  course 
diversity  as  reflected  by  the  number  of  different  units  of  instruction,  (3) 
expert's  average  rating  of  the  abstract  knowledge  requirement  of  the  course, 
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(A)  the  expected  student  attrition  rate  for  the  course,  (5)  the  number  of 
students  per  instructor,  (6)  the  average  number  of  months  of  instructional 
experience  of  the  instructors  assigned  to  the  course,  (7)  average  ratings  of 
the  quality  of  instruction  provided  by  course  instructors,  (8)  manning 
requirements  as  reflected  by  the  availability  of  a  selective  reenlistment 
bonus  for  the  specialty  associated  with  the  course,  (9)  length  of  academic 
day  (in  hours),  (10)  the  number  of  instructional  aids  employed  per 
instructional  hour,  (11)  the  percentage  of  training  hours  devoted  to 
hands-on  instruction,  (12)  the  frequency  of  formal  feedback  per  hour  of 
instruction,  (13)  practice  or  the  number  of  hours  devoted  to  a  unified  body 
of  material,  (1A)  student  flow  or  total  number  of  students  passing  through 
the  course  per  year,  and  (13)  the  reading  difficulty  of  course  materials. 

Analyses  undertaken  to  examine  the  construct  validity  of  the  LD  index 
were  straightforward.  Scores  on  the  various  student  input,  training 
outcome,  and  course  content  variables  were  correlated  with  the  LD  index. 

With  regard  to  analyses  involving  student  input  variables,  it  is  important 
to  note  that  the  LD  index  for  the  job  specialty  associated  with  a  course  was 
assumed  to  be  applicable  to  all  students  in  the  course.  Once  the 
correlational  analyses  were  completed,  the  pattern  of  relationships 
indicated  by  the  correlational  data  were  examined  to  evaluate  the 
discriminant  ar.d  convergent  validity  of  the  LD  index. 


Results  and  Discussion 

Table  1  presents  the  mean  and  standard  deviation  of  each  dependent 
variable  and  its  correlation  with  the  LD  index.  Examination  of  the  student 
input  variables  indicates  that  all  observed  correlations  were  statistically 
insignificant.  Because  the  LD  index  is  a  measure  of  a  job  property  and  the 
student  input  variables  are  measures  of  personnel  attributes,  this  outcome 
was  expected.  The  fact  that  this  expectation  was  confirmed  by  the  results 
of  the  present  analyses  lends  support  to  the  discriminant  validity  of  the  LD 
index. 

All  observed  correlations  for  the  training  outcome  variables  were 
statistically  insignificant.  This  outcome  was  also  expected.  As  a  result 
of  efforts  to  develop  the  covariance-structure  model  of  training,  we  have 
gained  some  insight  into  the  complex  interrelationships  that  combine  to 
influence  training  outcomes.  Largely  because  of  the  instructional  systems 
design  process  which  uses  information  concerning  job  tasks  as  one  basis  of 
course  design,  the  LD  index  is  conceived  of  as  having  a  primal  influence  on 
course  content.  However,  numerous  student  input  and  course  content 
variables  combine  to  influence  training  outcomes.  Consequently,  the 
relationships  between  the  LD  index  and  training  outcomes  are  indirect  being 
moderated  by  several  other  variables. 

The  strongest  relationships  produced  by  the  LD  index  were  with  various 
course  content  variables.  This  was  expected  because  as  previously 
indicated,  the  design  of  initial-skills  training  is  guided  by  job  content. 
For  instance,  because  the  LD  index  represents  learning  time,  it  was  expected 
that  a  positive  relationship  would  exist  between  the  LD  index  and  course 
length.  The  fact  that  a  moderate  positive  correlation  was  observed 
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for  these  two  variables  tends  to  argue  for  the  convergent  validity  of  the 
index.  Moreover-,  it  would  be  expected  that  the  LD  index  would  be 
positively  related  to  indices  of  course  subject  matter  difficulty  to  the 
extent  that  it  is  an  intrinsically  meaningful  index.  The  moderate  positive 
relationships  observed  between  the  LD  index  and  course  diversity,  abstract 
knowledge  requirement  and  expected  attrition  rate  support  this  expectation. 


In  addition  to  these  relatively  direct  relationships,  it  was  expected 
that  the  LD  index  would  yield  a  number  of  more  diffuse  relationships.  For 
instance,  it  might  be  expected  that  fewer  students  per  instructor  and  more 
experienced  instructors  would  be  a  means  of  compensating  for  task  learniitg 
difficulty.  The  observed  relationships  between  the  LD  index  and  these  two 
variables  lends  support  to  this  expectation.  Further,  it  might  be  expected 
that  the  overall  quality  of  instruction  would  be  low  for  tasks  of  high 
learning  difficulty.  Again,  the  observed  relationship  between  these  two 
variables  supports  this  expectation.  The  positive  relationship  between  the 
LD  index  and  manning  requirements  may  be  attributed  to  the  fact  that 
specialties  higher  in  learning  difficulty  generally  require  training  which 
is  highly  valued  in  the  civilian  work  place.  The  loss  of  military-trained 
technicians  to  the  civilian  labor  market  would,  in  turn,  lead  to  a  greater 
demand  for  personnel. 


Table  l.  *eans  and  Standard  deviation*  j:  Dependent  Variables 
and  3ivaria:e  Correlations  with  the  Learning  Difficulty  Index 

■.ASM3LES  *FAN  STANDARD  r 

DEVIATION 


Student  inputs  <N  *  5,9*0  students) 


Aptitude  Composite  Score 

88.800 

’.8.30 

.0*b 

ss 

Reading  Grade  Level 

11. -00 

1.00 

rs 

Academic  Motivation 

39.UOO 

13.-0 

.'.'19 

NS 

Education  Level 

2.180 

.45 

-.012' 

NS 

Mucational  ^repara:  i  -*n 

.9* 

-.  1  1 5 

NS 

Age 

20. 100 

2.20 

.0)5 

NS 

Training  Outcomes  <K  ■  *8  .ourses) 

F inal  Course  Grade 

85.200 

7.61 

.076 

NS 

SIA  Time 

b.5©0 

15.40 

.US 

NS 

Academic  Counseling 

1.-50 

3.47 

w-3 

NS 

Nonacademic  Counseling 

,;:o 

1.56 

-.0'H 

SS 

Wash back  Time 

1 l . 100 

51.50 

.Obb 

NS 

Academic  Attrition 

.028 

.16 

.01- 

NS 

Nonacademic  Attrition 

,00* 

.Ob 

-.002 

NS 

course  Content  <K  •  i8  courses) 

Course  Length 

-13.900 

309.30 

.501 

A* 

Course  Diversity 

54.800 

-3.30 

.536 

** 

Abstract  Knowledge  Requirement 

2.-20 

.98 

,-b7 

** 

Expected  Attrition  Rate 

.098 

.03 

.450 

** 

Student- Instructor  Ratio 

9.100 

4.80 

-.556 

** 

Instructor  Experience 

32.500 

14.70 

.59! 

** 

Instructor  Qua l i tv 

2.500 

.15 

-.314 

* 

Manning  Requirements 

•  4-0 

.49 

.423 

** 

Dav  Length 

.480 

.44 

-.033 

NS 

Instruct  xona 1  Aids 

.270 

.10 

.  161 

NS 

Hands-on  Practice 

.-10 

.13 

-.186 

NS 

Frequency  of  Feedback 

.340 

.12 

-.191 

NS 

Practice 

8.520 

3.17 

- .  074 

NS 

Yearly  Student  Flow 

1943.600 

2662.50 

-.200 

NS 

Reading  Difficulty 

10.980 

.63 

.031 

NS 

'  **  -  Probability  .01  that  observed  r  is  a  random  deviation  from  a 

«  population  r  of  aero. 

1  *  -  Probability  £  .05  that  observed  r  is  a  random  deviation  from  a 

\  population  r  of  zero. 

INS  -  Probabi 1  ity  > .06  that  observed  r  is  a  random  deviation  from  a 

population  r  of  zero. 
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Conclusion 


In  reviewing  the  results  of  these  analyses,  it  seeas  reasonable  to 
conclude  that  the  occupational  learning  difficulty  index  displays  construct 
validity.  The  index  was  found  to  be  aoderately  related  to  both  course 
length  and  a  number  of  indices  of  course  subject  aatter  difficulty  lending 
support  to  the  convergent  validity  of  the  index.  Moreover,  the  finding 
that  strong  relationships  did  not  exist  between  the  LD  index  and  a  number 
of  student  input  and  training  outcome  variables  tends  to  support  the 
discriminant  validity  of  the  index.  Overall,  the  evidence  produced  by 
these  analyses  appears  to  indicate  satisfactory  construct  validity  for  the 
occupational-level  index  of  learning  difficulty. 
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News  about  CAT  in  the  German  Federal  Armed  Forces  (GFAF) 

Wolfgang  hi  1  dp. rube 

Psychological  Service,  Bonn,  West  Germany 


Introduction 

At  the  last  annual  MTA-conference  1984  in  Munich,  a  workshop 
about  "CAT  in  Germany"  had  presented  the  first  activities  and 
experiences  used  computerized  testing  in  the  GFAF  (named:  CAT  I). 
Since  1983  the  GFAF  evaluated  two  experimental  installations  for 
computerized  testing  at  the  recruiting  centers  in  Munich  and 
Hannover  choosing  the  practical  approach  by  conventional  testing 
(Wildgrube,  1985  b;  see  also  the  contributions  of  Angermttller  and 
Kulling) . 

Meanwhile  CAT  made  an  important  step  forward  in  the  GFAF.  First 
a  lot  of  paperwork  to  prepare  computerized  testing  is  accom¬ 
plished  so  that  in  1987  CAT  systems  can  be  operational  at  the 
four  recruiting  centers  for  volunteers  and  the  recruiting  center 
for  officer  candidates  compiling  the  conventional  procedure  to 
computer  application.  Second  a  major  change  is  planned  at  the 
recruiting  centers  for  draftees.  The  final  goal  will  be  to 
accomplish  medical  examination  and  psychological  testing  - 
supplemented  hy  psychological  counseling  -  at  the  same  day  so 
that  each  draftee  knows,  after  one  day  at  the  recruiting  center, 
the  date,  location,  and  unit  for  the  time  in  the  service  respec¬ 
tively  for  his  beginning  basic  training.  Therefore  individualised 
testing  by  computer  is  necessary  for  this  one  day  examination 
for  draftees,  the  corresponding  pilot  project  started  in  January 
1985  in  Hannover  using  one  of  the  CAT  installations. 

Aptitude  Classification  Battery 

In  the  GFAF  the  Aptitude  Classification  Battery  (EVT  -  German 
abbreviation)  is  in  use  as  the  standard  entrance  examination  for 
draftees  and  volunteers  (similar  the  ASVAR;  officer  candidates 
have  a  special  test  battery).  Parallel  to  the  CAT  developments 
goes  a  major  revision  of  the  EVT  battery,  so  that  the  GFAF 
starts  in  January  1986  with  the  following  revised  test  battery: 
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20  Items/8  Alternatives  18  Min. 


Figure  Matrices  Test 
(FMT) 

Word  Relation  Test  20  Items/5  Alternatives  4  1/2  Min. 

(1VBT) 

Arithmetic  Reasoning  Test  20  Itens/Tnput  of  the  14  Min. 

(RT)  Results  (paper/penci 1  for 

notices) 

Spelling  (Orthographical  60  Items/4  Alternatives  12  1/2  Min. 

Test;  RST) 

Mechanical  Ability  Test  20  Items/5  Alternatives  IS  Min. 

(MT) 

Electrotechnical  Compre-  20  Items/5  Alternatives  20  Min. 

hension  Test  (EKT)  (Pretest  after  RT : 


8  Items 

6  Min.) 

Reaction  Test 

(RP) 

64 

Items/6  Alternatives 
Input  of  the  Results 

3,41  Min. 

Radio  Test 

(FT) 

150 

Items/3  Alternatives 
Inout  of  the  Results 

3,30  Min. 

Signal  Test 

(SigT) 

18 

Ttems/4  Alternatives 
Input  of  the  Results 

3  Min. 

Doppler  Test 

(DopT) 

20 

Items/3  Alternatives 
Input  of  the  Results 

4  Min. 

After  the  six  conventional  subtests,  up  to  now  carried  out  by 
paper  nencil ,  follow  four  special  tests  presented  by  maschines 
(’’apparati  ve"  tests).  This  fixed  sequence  of  subtests  will  be 
used  during  the  routine  application  of  the  EVT  by  computer. 

Changes  are  possible  at  any  time  by  interrupts  of  the  proctor,  for 
example  omitting  of  the  signal  test  or  stopping  after  the  elec¬ 
trotechnical  comprehension  test. 

New  Hard-  and  Software 

Concerning  the  rapid  developments  and  changes  in  the  area  of 
Personal  Computer  the  definition  of  new  hardware  and  further¬ 
more,  software  for  CAT  was  necessary  (named:  CAT  II).  Besides 
the  change  from  PC  with  8  bit  processors  to  PC  with  16  bit 
processors  the  new  equipment  will  contain  as  an  expansion  of 
CAT  I  further  the  four  "apnarative"  tests,  namely  reaction  test, 
signal  test,  doppler  test,  and  radio  test,  so  that  the  whole 
Aptitude  Classification  Battery  can  be  presented  by  computer. 

There  was  conformity  about  that  standard  hardware  and  software 
were  not  sufficient  for  CAT  at  fixed  locations/recrui ting 
centers  in  the  GFAF.  At  the  end  of  the  last  year  a  booklet 
with  the  detailed  requirements  was  worked  out  in  cooperation 
between  the  GFAF  and  the  German  firm  ZAK.  The  new  hard-  and 
software  will  be  delivered  in  December  1985.  One  installation 
of  the  new  equipment  is  assigned  for  the  recruiting  center  of 
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draftees  in  Munich,  the  other  one  is  a  twin-configuration  for 
the  recruiting  center  in  Mildesheim  which  should  be  delivered 
in  Spring  1986. 

In  addition  to  the  actual  developments  by  the  firm  ZAK  various 
firms  present  at  the  moment  different  concepts  and  corresponding 
financial  estimations  to  realise  the  requirements  of  the  GFAF 
concerning  CAT  for  example  in  fifty  recruiting  centers  for 
draftees. 

The  new  CAT  equipment  has  the  following  characteristics: 

-  A  local  area  network  (LAN)  is  in  use  with  15  work  stations 
for  examenees  applicable  different  test  batteries  at  each 
station . 

-  The  testing  session  will  be  monitored  at  a  central  place  by 
an  IBM  AT  with  harddisk.  A  second  IBM  AT  -  also  in  the  LAN 
linked  -  is  located  in  a  seperate  room  and  will  be  used  for 
input  of  personnel/biographical  data  and  for  output  of  results 
and  furthermore  the  second  central  place  is  prepared  as  back 
up  for  the  master. 

-  Back  up-arrangements  are  prepared  at  different  levels,  e.g.  the 
central  places  for  monitoring  (2  IBM  AT),  the  work  stations 

(15  stations  available),  power  interrupt  (additional  equipment 
and  software  tools),  test  results  (internal  twin  storage  at 
disk).  T^cre  is  no  break  allowed  longer  I’lan  30  minutes  as  well 
as  t lie  loss  or  the  repeat  of  a  whole  subtest. 

-  A  work  station  can  be  described  by  these  characteristics: 

♦  a  white  screen  with  black  items,  768  x  1024  pixels; 

+  a  headset  for  voice  output; 

+  a  special  keyboard  with  a  part  for  the  compiled  paper 
pencil  tests  (ten  digits  and  the  green  and  red  function 
keys)  and  with  four  special  parts  for  the  "apparativc" 
tests  (e.g.  reaction  test). 

-  At  the  central  place  of  the  CAT  station  monitors  the  nroctor 
the  test  session,  using  different  menues.  A  maximum  of  four 
tasks  arc  active  so  that  an  overview  about  the  state  of 
testing  at  any  time  is  possible. 

-  The  second  proctor  sits  in  the  seperate  room,  types  in  the 
personnel  data  (at  the  moment  the  system  provides  18  menues), 
or  starts  the  output  by  a  matrix  printer  ( the  different  output 
is  tailored  for  the  requirements  in  the  recruiting  center,  at 
subtest-,  person-  or  item-level).  The  reference  criterion  for 
all  data  is  the  personnel-: d-numher  (similar  the  social 
security  number)  so  that  all  data  (personnel  data,  biographical 
data,  test  data/ resul ts )  are  stored  in  a  data  base  at  the  end 
of  the  whole  psychological  examination.  The  possibilities  for 
transfer  to  a  mainframe  computer  is  considered,  at  subtest- 
level  online  and  at  item-level  via  tanedrive  for  follow  up 
studies . 

-  The  testing  session  starts  for  each  examenee  with  an  intro¬ 
duction  or  learning  phase.  There  is  the  chance  to  get  practice 
in  the  use  of  the  keyboard  and  to  learn  the  kind  of  testing. 

If  an  example  solved  or  typed  in  incorrectly  the  program 
presents  the  example  item  once  more.  This  phase  and  the 
example  items  before  each  subtest  are  supoorted  by  voice 
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output  via  headset.  The  whole  text  (without  the  items  within 
the  suhtests)  is  prepared  by  deltamodulation ,  stored  at  the 
harddisk  at  each  work  station,  and  then  monitored  (screen  and 
voice)  by  program  during  the  test  session. 

CAT  Center 

At  the  recruiting  centers  the  testing  station  is  only  installed 
for  the  application  of  the  tests.  After  given  in  the  password 
the  proctor  starts  the  session  using  the  different  menues,  while 
a  special  password  is  necessary  for  data  handling  or  for  use 
of  utilities.  All  aspects  of  change  of  the  items  or  the  item- 
pool  or  modification  of  the  software  or  the  testing  procedures 
will  be  carried  out  in  the  CAT  center  which  is  located  at  the 
Federal  Armed  Forces  Office  in  Bonn. 

Until  to  the  end  of  this  year  the  following  hardware  will  be 
delivered:  1  central  Place  (IBM  AT),  1  work  station,  printer, 
plotter  (for  graphical  items),  tape  drive  (for  data  transfer  to 
mainframe  at  item  level).  Furthermore  the  essential  software 
will  he  prepared  for  Bonn: 

-  Source  codes  of  the  whole  application  software. 

-  Compiler  for  Basic,  FORTRAN,  Pascal,  C  (the  greatest  part  of 
the  software  is  written  in  C) . 

-  Tools  used  by  the  firm  ZAK  for  the  software  developments. 

-  Item  editor  for  the  input  of  graohical/nongraphical  items, 
instructions,  voice  modules,  and  for  storage  in  an  itempool 
respectively  data  base. 

-  Item  editor  for  assembling  different  subtests,  modifying  testing 
procedures,  inserting  instructions,  and  creating  new  test 
batteries  . 

-  Utilities  for  data  handling  and  for  the  management  of  data  in 
a  base. 

-  Statistic  software  for  simple  analyses  at  the  personal  computer. 

All  changes  and  modifications  are  prepared  in  the  CAT  center  in 
Bonn  and  the  floppies  containing  the  newest  version  are  distri¬ 
buted  to  the  recruiting  centers  for  the  daily  routine  testing. 

In  addition  the  CAT  center  is  the  link  between  GFAF  and  firm  so 
that  all  hardware  troubles  first  are  reported  to  tbe  CAT  center. 

So  the  CAT  center  will  be  the  central  place  for  all  aspects  of 
CAT  in  the  GFAF  concerning  hardware,  software,  itempools,  data 
of  the  examenees,  as  well  as  for  the  scientific  evaluation. 

experiences  and  Results 

The  two  pilot  installations  in  Munich  and  Hannover  (CAT  I)  are 
in  use  daily  and  the  CAT  results  are  applied  to  selection  and 
classification  decisions.  Now  the  data  transfer  is  available 
from  CAT  to  mainframe  computer  via  tape  drive  for  different 
statistical  analyses. 

Besides  the  computerized  testing  information  has  been  collected 
by  an  ad-hoc-questionnaire.  Here  are  some  results  from  selected 
questions  collected  in  Hannover  (also  classified  by  the  four 
educational  levels)  and  Munich. 
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1.  When  I  entered  the  testing  room  Hannover  (Ns223)  Munich 


and  saw  the  testing  equipment,  I 

level 

of  education 

(N-175) 

was  curious  as  to  what  expected 

7 

«• 

3 

4 

T 

T 

me 

1 .  I  was  very  curious 

Am 

53 

78 

32 

165 

136 

2.  I  was  not  so  curious 

- 

22 

18 

9 

49 

32 

5.  I  was  not  curious  at  all 

- 

2 

5 

7 

u 

9 

7 

14. 

Comparing  the  computerized  test 
version  to  the  naner  and  pencil 
test,  I  think 

1.  that  I  like  the  computerized 

1 

61 

80 

32 

174 

134 

form  better 

2.  that  I  like  the  paper  and 

1 

11 

15 

8 

35 

33 

pencil  form  better 

15. 

I  often  nlay  video  games  (e.g. 
"Star  War") 

1  .  yes 

1 

14 

17 

8 

40 

39 

2 .  no 

1 

59 

80 

31 

171 

129 

16. 

I  have  experience  with  home 
computers 

1 .  yes 

- 

6 

22 

8 

36 

21 

2 .  no 

2 

69 

75 

31 

177 

149 

17. 

I  participated  in  this  test 
with  pleasure 

1.  if  yes,  why 

2 

70 

92 

34 

198 

145 

2.  if  no,  why  not 

- 

5 

3 

4 

12 

18 

The  results  indicate  a  high  accentance  of  computerized  testing 
respectively  this  kind  of  non-group  testing  by  computer.  The 
examenees  prefer  the  CAT  application  even  if  they  have  no  expe¬ 
riences  with  home  computers  or  video  games.  So  will  he  soon 
computerized  testing  without  any  problems  similar  the  conven¬ 
tional  paper  pencil  testing. 

CAT  offers  the  chance  for  recording  more  data  as  well  as  paper 
pencil  testing,  so  that  more  detailed  analyses  can  be  made.  One 
aspect  is  the  item  solution  time  which  is  recorded  for  each  item. 
Rut  at  time  new  models  in  testing  theory  and  basic  research  are 
necessary  to  interpretate  this  time-based  data  in  addition  to 
the  ability  parameter. 

An  other  important  point  concerns  the  difference  in  the  test 
scores  during  a  day  period  shown  in  the  following  table. 


Test 

(time) 

Scores 

morning 

noon 

P 

Time  used  per 
morning 

sub test 
noon 

(seconds) 

P 

WAT 

(11 .00) 

13.86 

13.01 

.002 

234 

246 

.000 

FDT 

(11 .00) 

14.46 

13.76 

.051 

505 

510 

.504 

RT 

(11 .00) 

9.99 

9.13 

.008 

826 

823 

.690 

MT 

(11.30) 

11.59 

10.70 

.001 

712 

711 

.890 

RST 

(12.00) 

o 

• 

00 

26.00 

.004 

172 

175 

.016 

F.KT 

(12.00) 

J  6.63 

5.55 

.000 

552 

561 

.202 

There  are  significant  and  relevant  differences  between  the  two 
groups  (sample  sizes  approximately  equal)  in  the  scores  of  all 
six  subtests.  Remarkable  are  the  significant  values  for  the  two 
subtests  "word  analogy"  and  "spelling"  concerning  the  time  used 
rer  subtest.  The  table  above  shows  for  the  other  subtests  very 
similar  time  for  solving  the  items  in  a  subtest,  while  the  scores 
are  different.  Further  research  is  needed  for  this  aspect  so  that 
different  norms  will  be  used  for  the  morning  and  for  the  noon 
session  if  necessary. 
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EFFORTS  TOWARDS  THE  IMPROVEMENT  OF  THE 
COMPUTERIZED  ADAPTIVE  SCREENING  TEST  (CAST) 


Deirdre  J.  Knapp,  Rebecca  M.  Pliske,  and  Richard  M.  Johnson 
U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences^ 


The  Computerized  Adaptive  Screening  Test  (CAST)  was  designed  by  the  Navy 
Personnel  Research  and  Development  Center  (NPRDC)  under  the  sponsorship  of 
the  Army  Research  Institute  (ARI)  to  provide  a  prediction  of  prospects’  Armed 
Forces  Qualification  Test  (AFQT)  scores  at  recruiting  stations.  This  paper 
briefly  discusses  the  development  of  CAST  and  summarizes  current  efforts  to 
enhance  its  utility. 

Background 


Armed  Forces  applicants  fall  into  certain  mental  categories  on  the  basis  of 
their  AFQT  scores.  AFQT,  which  is  intended  to  be  a  measure  of  trainability, 
is  derived  from  a  linear  combination  of  subtest  scores  (i.e.,  WK,  AR,  PC,  and 
one-half  of  NO)  on  the  Armed  Forces  Vocational  Aptitude  Battery  (ASVaB).  In 
the  Army,  individuals  who  score  at  or  above  the  50th  percentile  (mental  cate¬ 
gories  1,  2,  and  3A)  are  eligible  for  special  options  and  benefits  such  as 
the  2-year  Enlistment  Option  and  the  Army  College  Fund.  Applicants  who  score 
between  the  31st  and  49th  percentiles  on  AFQT  (mental  category  3B)  qualify 
for  enlistment  but  are  not  eligible  for  special  options.  Lastly,  those  indi¬ 
viduals  who  score  below  the  31st  percentile  (mental  categories  4A,  4B,  4C, 
and  5)  are  regarded  as  being  low  priority  candidates  for  enlistment. 

It  is  vital  that  recruiters  have  access  to  information  which  predicts  pros¬ 
pects'  AFQT  performance  for  several  reasons.  For  example,  recruiter's  mis¬ 
sions  specify  not  only  the  number  of  recruits  to  be  enlisted,  but  also  the 
quality  of  those  recruits  as  determined  by  mental  category  classification. 
Further,  if  a  prospect  appears  to  have  virtually  no  chance  of  producing  an 
acceptable  AFQT  score,  the  recruiter  may  choose  to  discourage  him  or  her  from 
further  interest  in  the  Army.  The  recruiter  can  then  spend  more  time  selling 
the  Army  to  more  promising  prospects.  Finally,  if  a  prospect  appears  to  be 
of  average  quality,  the  recruiter  may  not  want  to  spend  much  time  describing 
special  options  and  benefits  to  the  individual.  On  the  other  hand,  it  the 
recruiter  does  not  sell  the  options  and  benefits  to  those  individuals  who  are 
likely  to  be  eligible  for  them,  he  or  she  is  failing  to  use  a  powerful  sales 
tool.  Clearly,  recruiters  can  enhance  their  performance  if  they  effectively 
use  a  valid  predictor  of  prospects'  AFQT  performance. 


*The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  view  of  the  US  Army  Research  Institute  or  the 
Department  of  the  Army. 
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Description 


As  its  name  indicates,  CAST  is  a  computerized  adaptive  test.  Adaptive 
tests  are  constructed  so  that  they  are  tailored  to  fit  each  examinee. 

This  is  done  by  administering  items  which  have  optimal  discriminability  given 
a  particular  examinee's  ability.  This  can  be  compared  to  traditional  testing 
where  all  examinees  respond  to  the  same  items,  regardless  of  differences  in 
examinee  ability.  Hence,  adaptive  tests  are  more  efficient  to  use  than  trad¬ 
itional  tests  because  a  comparable  amount  of  information  can  be  gained  from 
fewer  test  items. 

The  item  pool  for  CAST  was  developed  by  researchers  at  the  University  of 
Minnesota  (cf.  Moreno,  Wetzel,  McBride,  &  Weiss,  1983)  for  use  in  the  devel¬ 
opment  of  a  computerized  adaptive  version  of  ASVAB.  Researchers  at  NPRDC 
developed  the  software  capable  of  administering  items  predictive  of  AFQT  on 
the  Army's  microprocessor  system  known  as  JOIN. 

The  current  operational  version  of  CAST  consists  of  78  word  knowledge  (WK) 
items  and  225  arithmetic  reasoning  (AR)  items.  All  items  are  multiple  choice 
with  a  maximum  of  five  response  alternatives.  The  items  were  developed  using 
the  three-parameter  logistic  ogive  item  response  model  (Birmbaum,  1968);  thus 
each  item  has  three  parameters  (discrimination,  difficulty,  and  guessing) 
associated  with  it.  Test  items  for  CAST  were  chosen  so  that  the  discrimina¬ 
tion  parameter  values  *ould  be  greater  than  or  equal  to  .78;  the  difficulty 
parameter  values  would  rauge  between  +2  and  -2;  and  the  guessing  parameter 
values  would  be  less  than  or  equal  to  .26.  The  ability  estimates  yielded  by 
CAST  are  based  on  the  Bayesian  sequential  scoring  procedure  discussed  by 
Jensema  (1977).  The  stopping  rule  is  ten  WK  items  and  five  AR  items. 

Validation  Information 

There  are  three  validation  efforts  associated  with  CAST.  The  initial  valida¬ 
tion  study  was  conducted  at  the  Los  Angeles  Military  Entrance  Processing 
Station  (MEPS)  with  a  sample  of  312  U.S.  Army  applicants  (Sands  &  Gade, 

3  983) .  The  correlation  between  CAST  scores  and  AFQT  scores  was  .85.  The 
second  data  collection  effort  took  place  in  Army  recruiting  stations  in  the 
midwestem  region  of  the  U.S.  during  the  first  two  months  of  1984  (Pliske, 
Gade,  &  Johnson,  1984).  CAST  scores  were  linked  to  subsequent  AFQT  perform¬ 
ance  via  the  social  security  numbers  (SSN's)  of  the  prospects.  More  specif¬ 
ically,  recruiters  recorded  prospect  SSN's  thus  allowing  the  researchers  to 
locate  the  appropriate  MEPS  records.  Matching  records  for  1,962  prospects 
were  located  and  the  resulting  validity  estimate  was  .80. 

The  most  recent  estimate  of  CAST's  validity  is  based  on  data  which  is  being 
collected  from  a  sample  of  60  Army  recruiting  stations  during  January  through 
December  of  1985.  This  sample  was  selected  to  be  representative  of  the  popu¬ 
lation  of  approximately  2,000  Army  recruiting  stations  in  terms  of  geographic 
location,  population  density,  and  ethnic  composition.  The  correlation  be¬ 
tween  CAST  and  subsequent  AFQT  performance,  based  on  preliminary  analyses  of 
the  first  six  month's  of  data,  is  comparable  to  those  obtained  in  the 
earlier  studies  (r=.82;  n=2,240). 


Clearly,  CAST  is  a  valid  predictor  of  AFQT  performance.  Of  course,  one  prob¬ 
lem  with  computerized  adaptive  testing  is  the  need  to  have  a  computer  stand¬ 
ing  by  to  administer  it.  Since  recruiters  do  not  always  have  this  luxury, 
there  remains  the  need  to  use  the  Enlistment  Screening  Test  (ESI),  a  paper- 
and-pencil  predictor  of  AFQT.  The  initial  validation  information  regarding 
EST  was  provided  by  Mathews  and  Ree  (1982).  Although  their  data  yielded  a 
healthy  validity  estimate  of  .83,  a  cross-validation  of  the  test  has  not  been 
reported.  Consequently,  the  recruiting  stations  which  have  been  providing 
CAST  validation  data  over  the  past  year  have  also  been  asked  to  record  the 
SSN's  and  EST  scores  of  all  prospects  who  taue  EST  rather  than  CAST.  Based 
on  six  month's  of  data,  the  validity  estimate  is  .79  (n=685).  As  expected, 
the  validity  of  EST  has  been  reaffirmed. 

Proposed  Improvements 


Currently,  efforts  are  underway  to  improve  three  specific  aspects  of  CAST. 

The  first  aspect  concerns  the  kind  of  information  that  the  test  provides  to 
the  recruiter.  The  second  ana  third  aspects  involve  the  test's  item  pools 
and  stopping  rule.  At  the  present  time,  CAST  provides  bar  charts  that  give 
information  about  performance  on  the  WK  and  AR  subtests  and  the  examinee's 
predicted  AFQT  percentile  score.  There  are  two  fundamental  problems  with 
providing  only  point  predictions  of  AFQT  scores  to  recruiters.  The  first 
problem  is  a  function  of  the  statistical  naivete'  of  most  recruiters.  Be¬ 
cause  the  great  majority  of  recruiters  do  not  understand  the  concept  of 
correlation,  they  do  not  adequately  understand  the  nature  of  the  point  pre¬ 
diction  that  they  are  given.  Hence,  recruiters  complain  that  predicted  and 
actual  AFQT  scores  often  fail  to  be  exactly  the  same.  A  second  problem  with 
the  use  of  point  predictions  concerns  the  way  in  which  recruiters  utilize 
information  from  CAST.  As  indicated  earlier,  recruiters  are  primarily  con¬ 
cerned  with  the  prospects'  subsequent  classification  into  one  of  three  groups 
of  mental  aptitude  categories.  Given  these  considerations,  it  seems  that 
recruiters  would  be  better  served  if  CAST  provided  output  that  reported  the 
odds  associated  with  a  given  prospect  falling  into  each  of  the  three  critical 
mental  categories. 

Two  approaches  to  category  prediction  are  being  studied  using  data  from  the 
on-going  validation  effort  described  above.  The  first  approach  models  the 
strategy  that  recruiters  probably  follow.  That  is,  one  uses  the  point  pre¬ 
diction,  which  is  based  on  a  regression  model,  to  determine  the  mental  cate¬ 
gory  to  which  an  individual  will  likely  belong.  The  second  approach  is  based 
on  classification  analysis.  In  contrast  to  regression  analysis,  classifica¬ 
tion  analysis  provides  subtest  weights  that  optimize  category,  rather  than 
point,  predictions.  Table  1  shows  the  percentage  of  cases  which  were  classi¬ 
fied  into  each  of  three  categories  on  the  basis  of  these  two  approaches.  A 
comparison  of  the  two  approaches  reveals  that  they  differ  with  respect  to 
where  their  prediction  errors  occur.  Both  approaches  are  good  at  identifying 
individuals  who  fall  into  categories  1-3A  (Approximately  75%  of  the  people 
who  are  predicted  to  be  in  1-3A  actually  are  in  1-3A).  Classification  analy¬ 
sis,  however,  is  much  better  than  regression  analysis  at  identifying  indi¬ 
viduals  who  are  in  categories  4A  and  below  (80%  versus  55%  accurate 
prediction).  This  advantage  is  at  the  expense  of  a  somewhat  poorer  ability 
to  identify  prospects  who  are  in  category  3B. 
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TABLE  1 


Percentage  of  Cases  Classified 
Into  Critical  ASVAB  Categories 


PREDICTED  AFQT  CATEGORY* 


1-3A 

3B 

4A  AND  BELOW 

ASVAB 

1-3A 

76/77 

23/18 

1/5 

AFQT 

3B 

25/27 

60/38 

15/35 

CATEGORY 

4A  and 

4/4 

41/16 

55/80 

below 


*  Regression  analysis  to  left  of  diagonal;  classification  analysis 
to  right  of  diagonal. 

Regardless  of  which  method  of  category  prediction  is  used,  the  information 
provided  can  be  presented  in  a  way  that  would  be  logical  to  recruiters.  For 
example,  CAST  software  could  be  changed  to  report  the  probabilities  associ¬ 
ated  with  an  examinee  falling  into  each  of  the  three  critical  categories. 

The  recruiter  could  then  compare  the  odds,  and  make  an  informed  judgment  con¬ 
cerning  his  or  her  subsequent  course  of  action.  The  important  point  here  is 
that  presenting  CAST  results  in  such  a  fashion  would  make  it  clear  to  re¬ 
cruiters  that,  although  those  results  can  be  very  useful,  they  are  not  infal¬ 
lible. 

Turning  to  the  subject  of  CAST's  stopping  rule,  the  questions  to  be  asked  are 
twofold.  First,  assuming  that  the  stopping  rule  is  to  be  based  on  number  of 
items,  what  is  the  optimal  number  of  subtest  items  to  administer?  Second, 
would  it  be  better  to  base  the  stopping  rule  on  the  precision  of  the  ability 
estimate  rather  than  on  the  number  of  items  being  administered?  Data  from 
the  latest  validation  study  have  been  used  to  examine  the  first  question. 

The  version  of  CAST  which  is  being  used  in  the  60  experimental  recruiting 
stations  administers  five  more  items  per  subtest  than  the  operational  version 
of  the  test.  It  also  records  response  time  so  that  the  time  it  takes  to  ad¬ 
minister  various  subtest  length  combinations  can  be  compared. 

Validity  estimates  and  average  completion  times  were  computed  for  six  subtest 
length  combinations  (5,  10,  and  15  WK  items;  5  and  10  AR  items).  Validity 
coefficients  ranged  from  .79  to  .85.  Completion  times  ranged  from  just  over 
10  minutes  to  just  over  18  minutes.  Given  that  the  validity  estimate  associ¬ 
ated  with  the  current  stopping  rule  is  .82  and  the  average  completion  time  is 
a  little  over  12  minutes,  it  appears  that  an  increase  in  subtest  length  would 
not  be  justified.  A  substantial  increase  in  completion  time  is  required  for 
even  a  small  increase  in  validity. 

With  an  adaptive  test,  an  ability  estimate  and  the  variance  associated  with 
chat  estimate  is  computed  each  time  an  examinee  answers  a  new  test  item. 
Rather  than  stop  the  subtests  after  a  given  number  of  items  are  administered, 
the  test  software  can  be  altered  to  end  the  subtests  once  the  variance  esti¬ 
mate  has  dropped  to  a  given  value.  The  advantages  and  disadvantages  of  al- 
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tering  CAST  to  rely  on  the  latter  type  of  rule  need  to  be  evaluated  care¬ 
fully.  Also,  the  determination  of  an  optimal  variance  criterion  would  re- 
q  ii>r  further  study. 

The  iwo  item  pools  currently  contained  in  CAST  will  be  expanded  within  the 
next  2-3  years.  Given  the  extensive  use  of  the  test,  it  is  important  that 
the  Item  pools  are  large  enough  to  prevent  the  frequent  recurrance  of  par¬ 
ticular  test  items.  The  possibility  of  developing  items  which  will  provide 
optimal  discrimination  at  the  critical  AFQT  cutpoints  is  also  being  consid¬ 
ered. 

Closing  Remarks 

CAST  Js  a  very  good  test  which  we  are  seeking  to  make  even  better.  At  the 
present,  time,  our  efforts  are  primarily  aimed  at  changing  the  software  to 
yield  information  that  will  be  of  the  greatest  use  to  recruiters.  Our  plans 
over  the  next  couple  of  years  are  aimed  at  insuring  the  continued  integrity 
of  the  test  itself. 
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Absl ract 

The  ethics  and  pragmatics  associated  with  developing  an  automated 
performance  test  system  to  study  the  effects  of  various  treatments  make 
repealed  measures  in  small  groups  of  subjects  the  customary  research 
paradigm.  In  such  cases,  test  stability,  reliability,  and  factor  structure 
take  on  extreme  significance.  In  a  Navy  program  80  percent  of  150  tests 
failed  lo  meet  minimally  acceptable  psychometric  requirements.  Recent 
findings  with  our  battery  show:  acceptable  psychometric  properties  in  i  orms 
of  both  differential  stability  and  reliability  for  both  the  long  and  the 
short  battery;  two  factors  available  for  the  7.5-minute  test  battery;  four 
for  the  15  minute  batlory  and  correlation  with  the  WAIS.  The  factorial 
richness  of  the  battery  is  adequate  and  goes  beyond  the  factors  that  can  be 
conveniently  measured  by  more  traditional  paper- and- pencil  tests  into  motor 
speed  dimensions  that  may  have  important  practical  implications  for 
assessment  of  concurrent  functional  capacity.  Both  factorial  richness  and 
correlation  with  the  more  global  cognitive  capacity  construct,  IQ,  might  be 
Improved  by  the  inclusion  of  subsea 1 es  indexing  verbal  and  arithmetic 
abilities,  however,  adding  factors  is  not  without  penalty.  The  trade-offs 
of  these  issues  (testing  time,  factor  structure,  stability,  stabilization 
time,  and  reliability)  are  discussed.  About,  a  dozen  validation  studies  are 
presently  ongoing.  What  remains  is  to  demonstrate  functional  validity  in 
the  detection  of  human  functional  capacity  deficils  In  a  real-world 
setting.  This  should  be  our  primary  mission  in  subsequent  work. 


INTRODUCTION 

Exotic  work  environments  often  include  factors  (i.e.,  weightlessness, 
motion,  fatigue,  etc.)  that  disrupt  performance.  Furthermore,  there 
settings  are  typically  populated  by  limited  numbers  of  highly  critical 
workers.  Kennedy  and  Bittner  (1977)  have  observed  that  two  connected 
concerns  associated  with  the  measurement  of  performance  under  such 
conditions  are  the  lack  of  sensitive  tests  and  a  general  unwillingness  to 
expend  the  time  and  effort  necessary  to  standardize  such  tests.  A  program 
designed  to  evaluate  performance  measures  (PETER)  was  undertaken  by  the 
Naval  Biodynamics  Laboratory,  New  Orleans,  LA  (Kennedy  &  Bittner,  1977; 
Biin-.i'r  &  carter,  1981;  Kennedy,  Bittner,  Harbeson,  &  Jones,  1982),  and 
more  than  150  performance  tasks  have  been  examined  for  suitability  in 
repealed  measures  research.  Detailed  descriptions  of  the  evaluation 
process  and  task  metric  selection  criteria  may  be  found  in  Bittner,  Carter, 
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Kennedy,  Harboson, ,  and  Krause  (1984).  A  listing  of  30  tests  which 
survived  this  test  and  evaluation  process  appears  in  Bittner  et  al.  (1984), 
but  for  the  most  part  the  tests  that  were  studied  were  paper  and-  pencil 
tests.  The  easy  availability  and  economy  of  portable  high  speed  computers 
suggests  that  Innovative  methods  for  automated  data  collection  and  analysis 
must  be  explored.  Features  that  recommend  microbased  testing  systems 
include  capabilities  for  fully  automated  test  battery  administration  and 
data  storage,  as  well  as  portability  and  reduced  size  and  weight. 
Automated  and  portable  microprocessors  capable  of  administering  and  storing 
petfonnance  measures  and  responses  provide  the  obvious  vehicle.  The 
purpose  of  this  report  is  to  provide  a  descriptive  overview  and  a  brief 
prospectus  of  the  Automated  Performance  Tost  System  (API’S),  and  report  our 
recent  progress  in  the  engineering  analysis  of  the  battery  in  terms  of 

stability,  reliability,  factor  structure,  feasibility,  and  predictive 
validity.  Ongoing  validation  studies  which  examine  sensitivity  to 
treatments  are  being  recounted  elsewhere  at  this  meeting  (Johnson,  Kennedy, 
MerkLe,  Smith,  &  Bittner,  1985). 

RECENT  TEST  DEVELOPMENTS 

Preliminary  Study 

Method  and  Analysis.  The  "best"  six  tests  from  the  PETEK  program 
(Bittner  et  al.,  1984)  were  programmed  on  a  portable  microprocessor  and 

administered  along  with  tests  in  their  original  paper  and- penci l  formats. 

Twenty  three  Casper  College  male  and  female  students  were  tested  over  four 
replications  on  a  6.0  minute  computerized  battery.  The  group  means, 

standard  deviations,  and  4X4  intersession  correlation  matrices  were 
calculated  for  each  task  in  each  testing  mode.  Task  group  means  and 
standard  deviations  were  examined  across  sessions  for  evidence  of  task 
stabilization.  intersession  correlations  were  assessed  for  evidence  of 
task  differential  stability.  Rapid  stabilization  was  expected  since  at 
least  theoretically  comparable  practice  was  received  within  both  modes  of 
test  irig. 

Results  and  Discussion.  The  data  showed  that  all  tasks  in  both  modes 
give  good  evidence  of  stability  by  the  fourth  session,  with  high 
reliability  efficiencies  (r  >.85)  for  3  min.  of  testing.  Improvement, 
averaged  actoss  all  tasks  from  sessions  1  to  4,  was  approximately  20%.  The 
amount  of  improvement,  for  paper- and- pencil  testing  (22.4%)  was,  in  general, 
comparable  to  the  amount  of  improvement  demonstrated  in  the  microbased 
testing  mode  (19.3%).  Typically,  paper- and- pencil  testing  produced  higher 
scores  across  test  sessions  relative  to  microbased  testing;  however,  from 
the  data  the  acquisition  curves  for  both  modes  are  strikingly  similar. 
Fut thermore,  the  task  standard  deviations  provide  good  evidence  that  none 
of  the  tasks  in  either  mode  has  reached  a  ceiling.  Clearly,  all  indicators 
point  to  good  and  comparable  metric  characteristics  for  the  paper- and- 
pencil  and  microbased  versions  of  each  task.  The  factor  structure  obtained 
fro:  the  analyses  of  computerized  test  versions  in  each  of  the  four 
sessions  Indicates  the  presence  of  two  well-  identified  factors  in  the 
computer  battery.  Factor  l  is  clearly  a  "motor"  factor,  probably  related 
to  response  speed;  as  such,  it  affects  performance  on  Pattern  Comparison 
and  Grammatical  Reasoning.  Factor  2  is  just  as  definitively  a  "cognitive" 
factor  with  its  importance  for  various  tests  changing  with  practice.  The 


clarity  of  analyses  under  this  constraint  is  encouraging.  With  respect  to 
the  paper  and- pencil  tests,  there  is  reason  to  believe  that  these  are 
essentially  the  same  factors  as  for  computerized  versions,  but  the 
computerized  versions  appear  to  stabilize  earlier  and  to  be  more  clearly 
defined.  It  should  be  noted  that  in  both  the  microbased  and 
paper  and- pencil  analysis  a  possible  third  factor  gave  indications  of 
emerging.  The  nature  of  the  factor  is  unknown;  however,  the  "automaticity" 
of  responses  characteristic  of  well- practiced  skills  (Ackerman  &  Schneider, 

1984)  is  a  likely  potential  explanation.  Also,  a  significant  general 
factor  "g"  may  become  evident  withir  both  modes  of  presentation.  These 
findings  are  described  in  detail  elsewhere  (Kennedy,  Wilkes,  Lane  &  Homick, 

1985)  . 

Stabilization  Study 

Method  and  Procedure.  Thirty- one  male  and  female  college  students  were 
recruited  for  participation.  Prior  to  testing,  subjects  received  a  brief 
introduction  to  the  purpose  of  the  study  and  were  advised  regarding  the 
general  procedures  associated  with  data  collection.  Ten  (10)  paper- and- 
pencil  batteries  and  ten  (10)  microbased  batteries  were  administered  per 
subject,  and  the  subjects  were  also  tested  individually  with  the  Wechsler 
Adult  &  intelligence  Scale  (WA1S).  Six  of  the  tests  previously  recommended 
above  as  a  "mini- battery"  for  environment al  research  were  included  for 
exaination.  Four  additional  tests  were  also  studied.  The  group  means, 
standard  deviations,  and  intersession  correlation  matrices  were  calculated 
for  each  individual  paper- and- pencil  and  microbased  test.  Group  means  and 
standard  deviations  were  examined  for  evidence  of  test  stabilization,  and 
intersession  correlations  were  assessed  for  evidence  of  differential 
stability.  Task  definition  (magnitude  of  r  after  stabilization)  was 
determined  directly  and  then  adjusted  according  to  the  Spearman  equation 
foi  test  length  and  called  average  stabilized  "reliability- efficiency"  (to 
a  3- minute  base).  Predictive  validity  was  assessed  by  comparison  with  an 
individually  administered  test  of  intelligence.  Factor  analyses  used  the 
principal  factors  method  with  squared  multiple  correlations  as  community 
estimates,  followed  by  normalized  varimax  rotation. 

_Re su 1 1 s _ and  Discussion . 

o  Stability  of  Means  -  The  overall  impression  of  the  10  tests  was 
that  continued  improvement  occured  over  all  tests  but  is  slowed  down  in 
those  tests  considered  stabilized.  The  Sternberg  test  appears  to  stabilize 
by  Trial  4.  The  Preferred  Hand  Tapping  and  Non-preferred  Hand  Tapping 
tasks  stabilize  rather  late  in  practice,  by  about  Trials  7  or  8,  whereas, 
the  Two  Hand  Tapping  task  appears  stable  by  Trial  4.  The  Pattern 
Comparison  test  stabilizes  very  rapidly,  by  at  most  Trial  3;  whereas  the 
Manikin  test  stabilizes  later  at  Trials  6  to  7,  as  does  Code  Substitution. 
Gramatical  Reasoning  stabilizes  rapidly,  by  Trial  3,  and  Reaction  Time  by 
Trial  b.  The  Landolt  C  test  of  dynamic  visual  acuity  did  not  appear  to 
stabilize  over  the  10  test  days.  There  appears  to  be  a  strong  learning 
component  in  this  test  as  it  is  presently  structured  on  the  computer;  thus 
it’s  stability  is  insufficient  to  be  retained  in  the  lost  battery  in  its 
current  form.  Clearly,  the  acuity  test  is  a  candidate  for  improvement  in 
future  versions  of  the  performance  battery. 
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o  Standard  Deviations  -  The  standard  deviations  are  constant  and 

give  no  evidence  of  ceiling  effects. 

o  Differential  Stability  -  The  Sternberg,  the  three  Tapping  tasks. 
Pattern  Recognition,  and  Manikin  reach  apparent  differential  stability  by 
Trial  3.  Code  Substitution,  Gramatical  Reasoning,  and  Reaction  Time  reach 
differential  stability  somewhat  later,  but  apparently  by  Trial  6.  The 
average  reliabilities  of  these  tests  are  all  quite  high.  The 
intercorrelation  matrix  for  the  Moving  Landolt  C,  on  the  other  hand,  does 
not  give  any  indication  of  reaching  differential  stability. 

o  Factor  Analysis  -  We  recognize  the  limitations  of  performing  a 
factor  analysis  with  such  a  small  sample,  but  are  somewhat  encouraged  by 
t.h<*  good  stability  and  high  reliability  of  the  tests  and  plan  for  these 
results  to  be  advisory.  Factor  I  loads  on  Pattern  Comparison  and  Code 
Substitution  of  both  computer  based  and  paper- and- pencil  versions,  and  also 
loads  on  the  Manikin  and  Fitts  Histogram  test  which  did  not  have  dual 
vetsions.  Factor  11  appears  to  be  a  Motor  Speed  factor  because  it  loads  on 
Reaction  Time  and  the  three  Tapping  tasks  as  well  as  the  Sternberg,  but 
load-,  on  none  of  Hie  paper  and- pencil  tests;  therefore.  Factor  11  may 
represent  a  construct  that  can  be  measured  via  computerized  testing  but  not 
by  standard  paper- and- pencil  tests.  Factor  111  is  probably  best  thought  of 
as  a  Motor  Control  factor  because  it  loads  on  Tapping  tasks  and  Spoke  and 
Airring.  Factor  IV  is  a  pure  Gramatical  Reasoning  factor,  loading  on  both 
forms  of  this  test. 

o  Correlations  with  WAIS  -  The  microcomputer  based  tests  clearly 
correlate  most  strongly  with  Performance  IQ  and  less  strongiy  with  Verbal 
TQ.  The  strongest  simple  correlation  between  the  computerized  tests  and 
Full  Scale  IQ  was  for  Gramatical  Reasoning,  although  Non- preferred  Hand 
Tapping  was  fairly  high.  The  R  squared  values  indicate  that  a  substantial 
proportion  of  Performance  IQ  variance  can  be  predicted  from  the 
computerized  battery,  but  may  also  suggest  that  a  more  verbal  subtest  would 
improve  the  relation  to  Full  Scale  IQ. 

o  Paper  and  Pencil  Tests  -  These  tests  essentially  replicated  the 
microprocessor  based  tests  and  a  more  complete  description  of  these 
findings  appears  in  Kennedy,  Dunlap,  Jones,  and  Wilkes  (1985). 

o  General  ■  All  but  one  of  the  microcomputer  tests  could  be 

recommended  for  a  mid- ranged  (<10  rin.)  battery.  Based  on  the  factor 
analysis,  we  would  suggest  Pattern  Comparison,  Sternberg,  Tapping,  and 
Grammatical  Reasoning.  We  would  also  propose  that  each  test  be 

administered  twice  as  long  in  order  to  improve  the  reliability  and  thus 
afford  an  opportunity  for  improved  sensitivity. 

Conclusions 

The  philosophy  of  our  approach  to  performance  test  development  involves 
thtee  different  phases.  The  first  is  to  deal  with  only  tests  or  tasks  that 
can  be  sho'Ti  lo  be  psychometrically  sound.  This  requires  that  we 

demonstrate  stability  of  means  and  standard  deviation  within  few 
administrations,  and  v.ost  important,  that  differential  stability,  the 
syr.r.etry  or  constancy  of  trial-  to-  trial  intercorrelations,  be  shown  to 


occur  quickly  and  at  high  values.  The  second  phase  is  to  show  that  the 
battery  has  factorial  multidimensionality  and  lhal  the  subscales 
cross- correlate  with  earlier  performance  tests  and  other  recognized 
instruments  of  ability.  Finally,  it  is  necessary  to  demonstrate  and 
document  sensitivity  to  factors  known  to  compromise  performance  potential 
in  laboratory  and  ultimately  real  world  situations. 

The  cross  correlation  between  the  Portable  Human  Assessment  Battery  and 
the  VIA IS  IQ  measures  was  particularly  interesting  in  that  a  substantial 
portion  of  the  performance  subscaie  variance  could  be  accounted  tor  by  this 
self  contained  self  administered  short  battery  of  computerized  tests.  If 
arithmetic  and  verbal  subtests  were  implemenled  -.o  would  be  able  to 
substantially  improve  the  correlation  with  full-scale  IQ.  Of  very  great 
interest  were  the  strong  relations  shown  between  Nonpreferred  Hand  Tapping 
and  the  verbal  WA1S  subscaie.  This  was  the  second  highest  intercorrelation 
with  the  full-scale  IQ.  Jensen  and  Munro  (1979)  hypothesized  that  complex 
reaction  times  should  be  better  predictors  of  general  intelligence  (g)  than 
simple  reaction  limes;  and  certainly  this  procedure  is  borne  out  by  the 
present  data,  in  that  the  Sternberg,  a  complex  motor  response  task,  has  a 
higher  correlation  that  simple  reaction  time  with  the  VAIS  IQ. 
Furthermore,  Jenson  arid  Munro  (1979)  found  that  motor  speed  showed  as 
strong  a  relation  as  complex  reaction  time  to  g,  an  unexpected  finding,  but 
one  that  might  relate  to  the  strong  relation  between  tapping  and  IQ  in  the 
present  results.  A  comparison  with  other  mental  tests  is  in  order  (e.g.. 
Armed  Services  Vocational  Aptitude  Battery  (ASVAB) .  The  extreme  length  of 
the  ASVAB  may  be  thought  to  improve  the  stability  of  its  factors,  but 
actual ly  the  second  session  of  the  ASVAB  may  be  no  more  stable  than  Session 
2  of  the  computer  tests  we  reported,  the  latter  taking  lb  min  (cf., 
McCormick,  Dunlap,  Kennedy  &  Jones,  198b).  The  Wechsler  Adult  Intelligence 
Scale  (VAIS)  takes  1-2  hrs.  to  administer  and  purports  to  sample  two 
factors;  these  may  or  may  not  overlap  with  the  four  of  the  computer  battery 
we  report.  In  future  studies  in  this  program  we  will  attempt  to  anchor  our 
tests  against  these  other  better- known  tests.  For  example,  if  only  l  he 
computer  tests  are  considered  it  may  be  possible  to  sample  two  motor  and 
three  cognitive  factors,  each  within  6  min.  and  with  reliabilities  greater 
than  r  -  .70%  for  a  3  min.  base. 

Based  on  the  data  reported  above,  we  believe  that  the  following  four 
points  are  arguable  and  we  would  like  to  offer  the  following  hypotheticial 
situations  for  speculation.  (1)  A  Job  Sample  representing  real  world  work 
is  likely  to  take  >  100  hours  to  reach  stability,  and  if  a  single 
(composite)  score  (e.g.,  correct  detection)  based  on  60  min.  of  testing 
would  be  used  to  characterize  performance,  it  is  unlikely  that  such  a  score 
would  have  retest  reliability  greater  than  r  =  .60,  and  it  might  be  lower. 
Tf  many  scores  are  broken  out  (e.g.,  hits,  RMS  error,  miss  distance),  the 
individual  reliabilities  are  likely  to  be  lower  than  r  -  .30. 
Alternatively,  the  microprocessor  that  l  have  used  would  probably  take  <  1 
hour  (probably  30  min.)  to  reach  stability  on  the  test(s),  and  this  total 
score  is  likely  to  have  retest  reliability  (r  >  .90)  for  12  min.  of 
testing,  as  are  each  of  the  subtest  scores.  (2)  High  reliability  does  not 
assure  sensitivity,  but  lack  of  reliability  assures  insensitivity.  Most 
tests  advocated  for  use  in  unusual  environments  or  with  toxic  substances 
have  neither  been  checked  for  reliability  nor  stability.  (3)  Our 
microprocessor  based  battery  total  score  correlates  wel 1  with  global 
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measures  of  intelligence.  The  single  best  predictor  of  job  performance  In 
all  military  jobs  is  a  global  measure  of  intelligence.  Our  battery 
probably  shares  considerable  variance  with  military  jobs  and  job 
performance.  (4)  After  stability  of  performance  on  a  Job  Sample  and 
stability  of  our  battery's  performance,  if  both  are  corrected  for 
attenuation,  a  large  proportion  of  the  one  hour  test  (perhaps  80%)  would  be 
shared  with  the  12  min.  test,  although  the  former  may  take  250  times  as 
long  to  stabilize. 
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Background 

Tne  growing  concern  lor  our  national  security  ana  the  concomitant  expansion  in  the  size  ol  the  Navy  point  to  an 
increasing  need  lor  effect. ve  military  leadership  as  expressed  by  the  Secretary  of  the  Navy.  While  leadership  n  general  is 
one  of  the  most  thoroughly  analyzeo  concepts  to  be  found  m  the  research  literature,  military  leadership  has  had 
considerably  less  systematic  attention.  Military  leadership,  specifically  leadership  training  for  newly  commissioned  Nav> 
officers,  is  the  focus  of  this  paper. 

Navy  personnel  nave  generally  sieved  the  nselves  first  and  foremost  as  leaders.  They  see  the  need  for  strong 
leaders— those  who  have  the  initiative,  cojrage,  and  knowledge  to  tlunk  and  act  in  situations  where  military  objectives 
may  not  be  easily  recognized— as  more  critical  than  ever  Before. 

There  ,s  concern,  howevet,  that  many  of  tne  events  during  the  past  several  decades  have  eroded  traditional  leadership 
values.  The  result  has  been  an  overemphasis  on  management-based,  theoretical  frameworks  designed  by  social  scientists 
who  do  not  really  appreciate  leadership  in  a  military  context.  Articles  on  this  topic,  written  by  the  uniformed  military 
are  common.  Sarkesiar  (19S5),  for  example,  traces  the  evoljtion  of  the  corporate  management  model  back  to  the 
McNamara  years,  which  focused  on  cost-effectiveness  and  econometrics  and  shifted  the  focus  of  leadership  in  battle  to 
the  pursuit  of  management  goals.  He  feels  that  the  negative  impact  of  this  shift  was  felt  in  Vietnam.  Byron  (1985), 
believes  that  in  peacetime  the  leadership  demanded  in  combat  situations  is  forgotten  and  instead,  good  managers  are 
rewarded.  To  summarize,  many  feel  that  the  Navy  is  losing  its  military  leadership  capability,  which  will  in  turn  affect 
battle  readiness. 

An  obviojs  paradox  exists  m  the  Navy  commun.ty  in  light  ol  all  the  concerns  voiced  when  it  comes  to  what  should  be 
done  to  develop  these  ideal  leaders.  Prior  to  the  inception  ol  this  research  effort,  the  authors  conducted  numerous 
interviews  with  senior  Navy  officers  and  a  common  theme  emerged.  Leadership  training  is  conducted  when  everything 
else  is  finished— it  is  the  last  priority— ana  many,  given  their  way,  would  throw  it  out  altogether. 

In  part,  this  attitude  toward  leadersmo  training  stems  from  priorities.  To  teach  leadership,  something  else  must  go. 
Rut  this  attitude  seems  to  stem  _iso  from  a  lack  of  agreement  as  to  what  military  leadership  really  is,  and  how  (or  even 
if),  it  shoulo  be  named.  This  definitional  problem,  car.  be  re  nedied  if  leadership  is  viewed  as  consisting  of  three 
dimensions— management,  interpersonal  skills,  and  warnorism,.  While  warriorism  may  or  may  not  be  trainable,  the 
majority  opinion  of  leaders  represented  in  a  recently  published  bosk  about  military  leadership  (T;  ylor  &  Rosenbach,  198k) 
is  that  leadership  skills  (i.e.,  managerial  and  interpersonal  skills)  can  be  learned.  This  study  add  esses  issues  relevant  to 
"teaching"  these  leadership  skills  to  newl_,  commissioned  naval  oflicers. 

Current  Leadership  Training  in  the  Navy 

The  Navy  currentlv  provides  so  ne  training  lor  new  oflicers  to  help  them  assjme  their  leadership  role.  Each  of  the 
commissioning  sources  has  a  leadership  curriculum  for  naval  officers  as  part  of  their  education  and  preparation.  The 
specialty  schools  (e.g..  Surface  U  arfare  Officers  School  (SWOS))  continue  the  leadership  training  process. 

Until  recently,  the  leadership  training  program  at  SWOS  was  a  2-week  course  designed  as  part  of  the  Leadership 
Management  and  Education  Training  (LMET)  program  implemented  Navy-wide  in  1978  in  an  attempt  to  standardize  the 
Navy’s  leadership  training.  Since  1978  mam  of  the  LMET  courses  have  been  dropped  or  shortened.  The  reasons  for  this 
are  varied,  but  reflect  to  some  extent  the  attitude  mentioned  earlier  that  leadership  training  is  low  priority  training.  It 
also  reflects  a  desire  to  shorten  the  overall  training  pipeline  and  to  get  officers  into  the  fleet  sooner. 

Purpose 

If  officer  training  is  to  be  accom.pl, shec  r  t me  most  efficient  manner,  it  is  imperative  that  the  training  provided  be 
relevant,  effective  and  not  redundant.  This  studs  set  out  to  address  two  primary  questions.  Does  training  make  u 
difference'1  That  is,  arc  students  teaming  anything  thev  feel  will  be  relevant  to  their  leadership  positions"  If  so,  what  are 
they  learning  and  where,  and  can  leadership  training  be  provided  more  efficiently" 

METHOD 


Sa  nple 

Tne  sample  for  tins  s’.Jdy  consisted  o!  5s!  mewl,  so-  .muss  vned  jun  x<  officers  at  'iUOS.  Four  hundred  and  e.g'.t.- 
three  students  were  questioned  on  ther  fi'xt  duv  ol  S'kON  about  leadership  training  and  preparation  at  ther 
(.0  mnissioning  source.  Nine:.. eight  add.t.  v.u!  stidents  were  quest,  vied  betorc  and  after  participating  in  LMET.  Table  ! 
presents  a  des.ription  of  the  sample. 
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Table  I 


Description  of  SWOS  Sample 


First  Day 
of  S»OS 
(N  =  483) 

Pre/Post 
LMET 
Sample 
(N  =  98) 

Commissioning  Source 

USNA 

196 

3 

OCS 

37 

68 

NROTC 

250 

21 

Academic  Major 

Science/Engineering 

300 

il 

Humanities/Other 

183 

59 

Average  Age 

23 

24 

•V  choosing  surface  as  1st  choice  community 

64 

55 

%  intending  to  make  Navy  a  career 

Yes 

25 

IS 

No  1  j 

20 

Unsure 

65 

62 

Questionnaires 

Four  versions  of  a  questionnaire  were  designed  to  assess  leadership  training,  leadership  preparedness,  and  self-rated 
leadership  abilities,  and  to  collect  a  number  of  biographic  and  demographic  characteristics  of  the  students  (e.g.,  academic 
major,  age,  sex,  commissioning  source).  Among  the  four  versions  the  following  topics  were  also  measured!  managerial 
style,  achieving  style  as  measured  by  the  Manifest  Neet.  Questionnaire,  and  action  vs.  state  orientation.  These  topics 
will  not  be  addressed  in  this  paper. 

Questionnaires  were  administered  in  classroom  settings  by  SWOS  instructors.  Administrators  were  given  a  set  of 
instructions  to  read  to  the  students  before  filling  out  questionnaires.  Students  were  assured  that  all  information  was 
confidential. 


RESULTS 


Does  Leadership  Training  Really  Make  a  Difference’’ 

An  important  question  in  this  study  had  to  do  with  the  merit  of  leadership  training— does  it  really  make  a  oiflerence'’ 
Results  indicated  that  leadership  training  is  important  to  preparedness.  A  number  of  afferent  analyses  formed  the  basis 
of  this  conclusion.  First,  the  lelationship  between  new  students’  reports  of  their  training  and  preparedness  were 
correlated.  As  expected,  the  correlations  between  these  two  indicators  were  quite  high  (.5  to  .7).  At  least  in  the 
students'  minds  wey  see  tneir  training  as  related  to  their  levels  of  preparedness. 

Second,  analyses  indicated  that  students'  perceptions  of  their  leadership  training  and  preparation  were  more  related 
to  their  feelings  of  preparation  to  go  to  war  if  necessary  than  were  their  self-rated  leadership  abilities.  Preparation  to  go 
to  war  is  not  only  determined  by  ability;  training  has  an  impact. 

Third,  a  number  of  open-ended  comments  provided  by  new  students  indicated  that  they  felt  they  needed  more 
leadership  training,  especially  in  the  interpersonal  aspects,  in  order  to  best  assume  their  role  as  division  officer. 

Fourth,  98  students  were  questioned  before  and  alter  LMET  about  then  leadership  training  and  preparedness.  Or.  tne 
basis  of  paired  t-tests,  14  of  the  21  items  neasjrmg  different  aspects  of  leadership  preparedness  showed  significant 
improvements  as  a  function  of  LMET.  Mans  ite  ns  had  differences  greater  than  a  standard  deviation.  Specifics  of  these- 
differences  will  be  dnejssed  in  a  later  section. 

nat  Leaders'-, .p  T raining  is  Provided' 

Training  at  the  C.om  rusx.onng  So  jrces 

The  level  of  leadem  up  training  ne»  olficer-,  rece.ved  a’  tie  t  iree  naur  co  missioning  sojrces  was  of  particulu- 
concern  in  this  studs.  Table  2  presents  a  list  of  the  leadership  training  and  preparation  topics  assessed  and  indicates  th<- 
areas  where  officers  felt  most  anr.  leas:  tra.ned  and  p-epa'cd.  In  ge  io-al.  officers  entering  SUOS  felt  tnes  were  trainee 
"to  so  ne  extent"  (the  mdponl  o.n  the  scale  wn.ch  range  i  fro-r  I.  "not  a:  all,"  to  5,  "to  a  vers  large  extent")  ,n  nos; 
aspects  of  leadership.  Tne  overall  levels  of  per  .eived  preparation  were  so  new  nat  nigher  (x  =  3.6). 
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Table  2 


Leadership  Training  and  Preparation  Topics  Measured  and 
Extre  ne  Averages  By  Commissioning  Source 


New  SWOS  Students* 

Greatest 

OSna 

— 

NROTC 

Improvement 

trng  Prep 

<N  =  196) 

Trng 

(N  = 

Prep 

37) 

Trng 

(Nr 

Prep 

250) 

•Iter  LMET 
(N  r  98) 

a. 

Making  the  transition  from  the 
Naval  Academy,  OCS  or  NROTC 
to  the  operational  Navy 

b. 

Taking  responsibility  (or  a 
division  of  enlisted  personnel 

c. 

Understanding  Navy  procedures 
and  protocol 

3.5 

, 

d. 

Relieving  the  division  officer  in 
your  first  division  officer 
assignment 

2.8  2.9 

2.7 

3.0 

X 

e. 

Knowing  how  to  motivate 
enlisted  personnel 

X 

f. 

Performing  the  paperwork 

requirements  as  a  division 
officer  (PMS,  PQS,  etc.l 

2.5  2.2 

2.5 

3.0 

2.6 

2.8 

s- 

Managing  your  time  and  setting 
priorities  when  you  have  a  heavy 
workload 

9.9  9.3 

3.6 

3.9 

3.6 

3.9 

h. 

Talking  to  a  large  group  of 
people  who  work  lor  yoj 

1. 

Briefing  your  superior,  or  the  CO 
about  an  issue  in  your  division 

X 

)• 

Counseling  subordinates  about 
personal  matters 

2.9 

k. 

Counseling  poor  performers 

■H 

3.3 

X 

1. 

Handling  alcohol  and  drug  abuse 
problems  among  your 

subordinates 

3.3 

m* 

Resolving  confli:ts  among  your 
crew  members 

2.7 

n. 

Listening  effectively 

3.9 

o. 

Managing  stress  (i.e.,  lack  of 
sleep,  disappointing  your  boss, 
overwork,  conflicts) 

9.1  9.2 

3.9 

p- 

Communicating  with  people 
effectively 

3.9 

3.9 

q.  Demonst'ating  concern  for  your 
subordinates 


r. 

Setting  goals 

9.1 

9.2 

3.7 

s. 

Planning  work 

9.1 

9.1 

3.5 

t. 

Interacting  with  Chiefs  n  your 
division 

3.1 

X 

u. 

Rewarding  and  disriplining  vour 
subordinates 

X 

aOnly  areas  with  highest  and  lowest  average  leveN  ol  training  or  preparation  are  presented  in  this  table. 


The  levels  of  training  differed  across  commissioning  sources  in  18  of  21  areas.  For  the  most  part.  Naval  Academy 
(USNA)  graduates  felt  better  trained  than  those  from  Naval  Reserve  Officer  Training  Corps  (NROTC)  or  Officer 
Candidate  School  (OCS),  and  OCS  graduates  generally  felt  least  trained.  A  noteworthy  exception  to  this  was  training  m 
how  to  interact  with  chiefs  in  the  division.  In  this  area,  NROTC  graduates  felt  best  trained  and  Nava)  Academy  graduates 
felt  they  had  received  little  training. 

The  aspects  of  leadership  in  which  new  SUOS  students,  across  commissioning  sources,  felt  they  had  received  the  most 
training  and  were  most  prepared  were  "managing  time"  and  "setting  priorities  with  a  heavy  workload."  Naval  Academy- 
graduates,  in  addition,  felt  well  trained  and  prepared  in  terms  of  managing  stress,  setting  goals  and  planning  work. 
Graduates  of  OCS  and  NROTC,  however,  felt  more  prepared  in  areas  of  listening  effectively  and  communicating 
effectively  with  people  than  they  did  in  setting  goals  and  planning  work. 

Areas  where  new  SWOS  students  felt  least  prepared  were  "relieving  the  division  officer"  and  "performing  paperwork 
requirements."  Those  from  OCS  also  reported  little  training  in  terms  of  "counseling  subordinates,"  although  they  felt 
prepared  to  some  extent  to  do  this. 

Although  not  reflected  in  Table  2,  when  examining  the  levels  of  training  and  preparation  on  the  more  interpersonal 
aspects  of  leadership  (e.g.,  knowing  how  to  motivate  subordinates,  briefing  superiors,  counseling  subordinates),  the  levels 
of  training  were  rather  low,  and  the  levels  of  preparation  were  moderate.  This  was  especially  true  for  graduates  of 
NROTC  and  OCS. 

It  appears  that  in  general,  students  leaving  the  commissioning  sources  feel  better  prepared  in  terms  of  managerial 
type  skills  (goal-setting  and  planning)  than  in  the  interpersonal  aspects  of  leadership  (counseling  and  disciplining). 

LMET  Training  at  SWOS 

The  results  of  LMET  training  improve  this  state  of  affairs  to  some  extent  (see  Table  2).  Leadership  areas  which 
showed  the  greatest  improvements  as  a  result  of  LMET  were  "motivating  enlisted  personnel,”  "talking  comfortably  before 
a  large  group,"  "counseling  poor  performers,"  "interacting  with  chiefs  in  your  division,"  "rewarding  and  disciplining 
subordinates,"  and  "relieving  the  division  officer." 

Also  of  interest  were  two  items  in  which  perceptions  of  the  levels  of  training  decreased  from  time  1  to  time  2.  These 
areas  were  "understanding  Navy  procedures  and  protocol"  and  "performing  the  paperwork  requirements  as  a  division 
officer."  It  seems  likely  that  LMET  served  as  a  realistic  preview  and  made  officers  av/are  of  the  large  amount  they  didn't 
know  in  these  areas. 

Self-Ratings  of  Leadership  Ability 

While  the  majority  of  students?  perceptions  of  their  leadership  training  differed  significantly  across  commissioning 
sources,  self-ratings  of  leadership  ability  in  eleven  areas  did  not  differ. 

The  ares  in  which  all  students  felt  most  capable  was  in  "doing  whatever  it  takes  to  get  the  job  done."  Aspects  of 
leadership  in  which  students  felt  least  capable  were  "speaking  comfortably  in  front  of  a  group,"  and  "motivating 
subordinates  to  do  jobs  they  don't  want  to  do."  In  general,  officers  feel  more  able  to  perform  managerial  duties  than  to 
handle  the  more  interpersonal  aspects  of  their  leadership  role. 

The  leadership  ability  ratings  of  Academy  graduates  were  somewhat  surprising.  While  Academy  graduates  felt  better 
trained  and  better  prepared  than  graduates  of  the  other  two  commissioning  sources,  they  did  not  rate  their  leadership 
abilities  any  higher.  This  see  ns  to  dispel,  to  some  extent,  the  stereotype  that  Academy  graduates  have  trouble  adjusting 
in  their  first  tour  as  division  officers  due  to  an  over-confidence.  It  also  suggests  that  abilities  are  not  the  only  factor 
relevant  to  how  well  prepared  officers  feel  they  are.  Training  does  play  a  role. 

Additional  Findings  of  Interest 

Who  or  What  Influenced  Leadership  Skills? 

New  SWOS  students  were  asked  the  extent  to  which  classroom  instruction,  examples  of  military  leaders,  experience 
leading  others,  as  well  as  things  they  learned  before  they  entered  USNA,  OCS  or  NROTC  influenced  the  leadership  skills 
they  had  acquired.  Academy  and  NROTC  graduates  felt  examples  of  military  leaders  and  experience  leading  people  w-ere 
most  influential.  OCS  graduates  felt  they  had  been  most  influenced  by  things  they  learned  before  attending  OCS.  OCS 
graduates  also  tended  to  feel  their  levels  of  training  had  been  low  in  comparison  to  their  levels  of  preparation.  This 
supports  the  conclusion  that  OCS  graduates  feel  their  leadership  skills  are  less  a  function  of  their  officer  training  than  do 
those  who  go  through  NROTC  or  the  Naval  Academy.  (This  is  not  surprising  when  the  length  of  these  officer  training 
programs  is  considered  (i.e.,  lour  months  for  OCS  as  opposed  to  four  years  for  USNA  and  NROTC).)  All  three  groups  felt 
classroom  instruction  was  least  influential,  though  they  still  reported  it  influenced  them  "to  some  extent." 

Assuming  Different  Leadership  Roles 

Functioning  effectively  as  a  naval  officer  encompasses  a  number  of  different  leadership  roles.  The  officer  must 
function  as  a  leader  of  others,  a  technician,  a  professional,  and,  at  times,  a  warrior.  SWOS  students  were  asked  how  well 
prepared  they  felt  to  assume  each  of  these  roles  (sec  Table  3).  Students  from  all  commissioning  sources,  at  SWOS  for  the 
first  day,  felt  most  prepared  to  assume  the  role  of  professional,  and  least  well  prepared  to  assume  the  role  of  technician. 
One  might  expect  that  lu  weeks  of  intense  SWO  training  would  improve  officers'  confidence  in  their  preparation  to 
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assume  the  role  of  technician,  but  the  sample  of  students  questioned  on  their  last  day  of  SWOS  also  felt  least  prepared  to 
assume  the  role  of  technician  when  compared  to  the  other  three  roles.  This  may  be  because  officers  are  trained  to 
manage  technicians  rather  than  to  develop  technical  expertise.  Alternatively,  their  training  may  serve  to  make  them 
aware  of  how  much  there  is  to  know. 

The  overall  levels  of  preparedness  to  go  to  war  if  necessary  were  fairly  high  (it  =  3.6).  All  21  aspects  of  leadership 
training  were  significantly  related  to  preparation  to  go  to  war.  The  training  variables  most  highly  correlated  with 
preparation  to  go  to  war  were  in  areas  of  motivating  enlisted  personnel,  handling  alcohol  and  drug  tbuse  problems, 
resolving  conflicts  and  planning  work.  These  correlations  were  all  approximately  .26. 

Table  3 

Average  Levels  of  Preparation  to  Assume  Diverse  Leadership  Roles 
Before  and  After  SWO  Training  by  Commissioning  Source 


New  SWOS  Students _  _ SWOS  Graduates 


Leadership  P.ole 

USNA 
(N  =  196) 

CCS 
(N  =  37) 

NROTC 
(N  =  299) 

Overall 
(N  =  982) 

USNA 
(N  =  5) 

OCS 
(N  =  86) 

NROTC 
(N  =  39) 

Overall 
(N  =  125) 

Technician 

3.9 

3.2 

3.1 

3.2 

9.9 

2.9 

3.0 

3.0 

Professional 

9.3 

3.9 

3.9 

9.0 

9.2 

3.6 

3.9 

3.7 

Leader 

3.9 

3.9 

3.7 

3.8 

3.8 

3.5 

3.7 

3.5 

Prepared  to  go 
to  war 

3.8 

3.8 

3.5 

3.6 

9.0 

3.6 

3.6 

3.6 

Academic  Maior  and  Leadership  Preparedness 

The  Secretary  of  the  Navy,  Oohn  Lehman,  and  others  have  suggested  that  a  strict  science  and  engineering  curriculum 
may  be  too  academically  narrow  to  provide  new  officers  with  the  well-rounded  education  they  need  to  be  good  leaders. 
To  address  this  question,  correlations  were  computed  between  academic  major  (science  and  engineering  vs.  other)  and 
students'  perceptions  of  their  leadership  training,  preparation  and  ability.  (These  correlations  were  partial  correlations 
controlling  for  commissioning  source.)  No  significant  relationships  were  found  between  academic  major  and  training, 
preparation  or  ability.  Since  the  measures  of  leadership  are  all  self-report,  they  must  be  treated  with  some  caution,  but 
nonetheless,  no  relationships  emerged. 


DISCUSSION 


It  was  encouraging  to  discover  that  students  felt  their  training  influenced  their  levels  of  preparation  to  become 
leaders,  and  that  they  believed  classroom  instruction  was  useful.  It  was  also  interesting  that  academic  major  had  no 
relationship  to  students'  perceptions  of  their  leadership  preparation  or  leadership  abilities.  The  stereotype  that  science 
majors  have  narrow  academic  experiences  and,  therefore,  tend  to  be  less  sensitive  to  interpersonal  concerns  was  not 
supported. 

Of  particular  interest  were  the  training  issues  addressed.  It  appears  that  each  of  the  commissioning  sources  are 
preparing  new  officers  to  some  extent  to  assume  their  leadership  responsibilities.  The  bulk  of  this  training  seems  to 
impart  managerial  skills  rather  than  the  more  interpersonal  leadership  skills.  LMET  at  SWOS  furthers  this  leadership 
preraration  with  a  positive  impact  on  the  interpersonal  dimensions.  If  the  Navy's  goal  is  to  provide  the  most  efficient 
training  pipeline,  it  appears  that  the  commissioning  sources  would  do  well  to  concentrate  formal  leadership  training  in  the 
managerial  skills,  leaving  the  interpersonal  skills  to  LMET.  This  would  be  an  improvement  over  providing  minimal  training 
in  all  areas  at  both  schools.  It  also  seems  that  an  interpersonal  perspective  in  managerial  training  would  be  worthwhile 
(e.g.,  planning  work  and  setting  goals  for  subordinates).  Further,  while  the  length  and  method  of  the  training  experiences 
differ  at  the  various  commissioning  sources,  they  would  do  well  to  agree  on  a  standard  set  of  leadership  issues  to  be 
addressed  in  leadership  training  and  do  the  n  as  well  as  possible  in  the  time  frame  provided. 

While  the  findings  from  this  study  are  based  on  self-report  they  are  suggestive  of  issues  worthy  ol  future  pursuit. 
This  work  will  be  followed  up  with  input  lrom  SWOS  instructors  as  to  students'  leadership  abilities,  an  evaluation  of  the 
shortened  LMET  curriculum  and  optimally,  a  one-year  follow  up  of  individual  performance  in  the  fleet. 
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A  sample  of  320  Air  Force  aircraft  maintenance  officers  (AMOs)  were 
surveyed  using  the  updated  version  of  Yukl's  Managerial  Behavior  Survey  (MBS), 
to  measure  leader  behavior  of  the  AMO's  superior  officer,  and  other  scales 
focusing  on  the  AMO's  perception  of  his/her  own  leadership  development. 
Specific  development  methods  used  by  AMOs  and  the  perceived  importance  of  each 
were  explored.  Furthermore,  suggestions  were  collected  on  ways  to  improve 
development  methods  available  to  them  in  the  Air  Force.  The  leadership 
development  activities  were  correlated  with  the  superior's  leader  behavior  and 
with  demographic  and  organizational  variables.  The  personal  factors  of  age 
and  rank  were  found  to  be  associated  with  leadership  development. 

Participation  in  8  of  19  leadership  activities  correlated  significantly  with 
the  degree  of  importance  placed  on  the  activities.  Analysis  of  the  MBS 
results  indicated  certain  categories  of  the  superior's  leader  behavior  were 
significantly  associated  with  the  perceived  ’eadership  development  of  the  AMO. 


Introduction 

Leadership  is  a  constant  concern  in  the  military  environment,  yet  very 
little  is  known  about  how  leaders  are  developed.  The  overall  objective  of 
this  research  was  to  identify  methods  of  leadership  development  used  by  Air 
Force  officers,  specifically  junior  aircraft  maintenance  officers  (AMOs),  with 
special  interest  in  the  impact  of  the  immediate  superior's  leader  behavior  on 
the  junior  officer's  development. 

For  the  purposes  of  this  research,  leadership  is  defined  as  a  dynamic, 
goal -directed  process  of  influence  between  leader  and  follower,  including  the 
interaction  of  each  with  the  situation  (Yukl,  1981).  Leadership  development 
is  broadly  defined  as  any  method  or  activity  used  by  individuals  to  enhance 
their  personal  ability  to  influence  subordinates  toward  goal  accomplishment. 

Leadership  Theory  and  Research 

The  subject  of  leadership  has  been  addressed  by  many  scholars  from 
ancient  times  to  the  present.  One  of  the  major  approaches  which  has  been  key 
in  understanding  the  concept  is  the  leader  behavior  approach.  Research  at 
Ohio  State  University  resulted  in  identification  of  the  classic  dimensions  of 
consideration  and  initiation  of  structure  (Halpin  and  Winer,  1957),  while 
efforts  by  the  University  of  Michigan  Survey  Research  Center  produced  the 
distinction  of  job-centered  and  employee -centered  leadership  (Likert,  1961) 
and,  later,  the  four  categories  of  support,  interaction  facilitation,  goal 
emphasis,  and  work  facilitation  (Bowers  &  Seashore,  1966).  These  categori¬ 
zations,  although  important  in  developing  an  understanding  of  leadership  and 
still  used  today  in  the  context  of  some  of  the  situational  theories,  are  not 
without  their  problems. 


One  of  the  key  issues  has  been  the  recognition  that  .  .  these  broadly 
defined  categories  provide  too  general  and  simplistic  a  picture  of  leadership. 
They  fail  to  capture  the  great  diversity  of  behavior  required  by  most  kinds  of 
manager;  and  administrators"  (Yukl,  1981,  p.  120).  This  realization  led  Yukl 
and  his  colleagues  to  develop  a  more  comprehensive  categorization,  which 
currently  includes  13  dimensions.  According  to  Yukl,  "The  advantage  of  the 
new  taxonomy  is  that  it  has  a  larger  number  of  mere  specific  behavior 
categories  than  earlier  ones,  and  it  includes  most  behaviors  found  to  be 
important  in  leadership  research"  (Yukl,  1981,  p.  128).  Because  of  these 
advantages,  Yukl 1  s  Managerial  Behavior  Survey  (MBS)  was  selected  for  the 
description  of  superior  officers'  leader  behavior  in  this  research. 

M i 1 i tary  Leadership  Development 


Studies  of  the  process  of  leadership  development  in  the  military  services 
na;  focused  primarily  on  professional  military  education  (PME)  or  other  formal 
programs.  Tne  Air  Force  has  essentially  come  full  circle  in  their  philosophy 
of  educating  leaders  in  the  past  12-15  years.  Reports  in  the  1970s  expressed 
concern  by  senior  officers  on  the  need  for  development  of  technical  and 
management  skills  (Dobias,  1974;  Robinson,  1974)  and  suggested  better  pre- 
commissioning  training  and  even  on-the-job  training.  Then  came  concern  in  the 
late  1970s,  still  present  today,  that  the  Air  Force  had  gone  too  far  in  the 
technical  training  of  young  officers,  making  them  more  occupational  than  truly 
professional  (Gosnell,  1980).  Another  recent  Air  Force  study  of  particular 
interest  emphasized  the  important  role  played  by  commanders  in  serving  as 
leadership  models  for  the i ^  subordinates  (Benton,  1931).  This  important 
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Force  with  its  emphasis  on  formal  programs  of  education  and  training. 

Research  within  the  U.S.  Army  reinforces  this  emphasis  on  the  supervisor- 
subordinate  relationship.  In  analyzing  problems  in  junior  officer 
development,  Well  ins  and  others  (1980)  noted  that  a  key  issue  was  that  "the 
senior  officer  may  not  take  time  to  supervise,  guide,  and  correct  the 
performance  of  the  new  lieutenant"  (p.  5).  Another  important  study  conducted 
within  the  Army  concluded  that  the  most  successful  leader  training  results 
from  experiential  procec:>es  ("learning  by  doing")  rather  than  analytical  or 
procedural  processes  (Shriver  et  al.,  1980). 

Perhaps  the  most  revealing  observation  from  a  review  of  the  literature  on 
leadership  theory  and  leadership  development  is  that  there  has  been  little  if 
any  research  to  identify  methods  which  present  or  prospective  leaders  actually 
use  far  leadership  development.  To  identify  these  methods  and  their  relative 
usefilness  for  a  specific  group  of  leaders  was  the  focus  of  this  research. 


Methodology 


5  amp  1 e 


The  original  population  of  interest  was  all  U.S.  Air  Force  aircraft 
maintenance  officers  serving  in  the  grade  of  lieutenant  or  captain.  For 
practical  considerations,  this  was  narrowed  to  those  serving  in  the 
continental  United  States  (CON'JS)  in  the  three  largest  major  commands-- 
Military  Airlift  Command  (MAC),  Strategic  Air  Command  (SAC),  and  Tactical  Air 
Command  (TAC).  Generalization  of  the  findings  of  this  research  should  be 
limited  to  these  specific  population  parairn't. ers .  According  to  official 
sources,  the  population  size  was  730  officers.  Air  Force  officials  authorized 
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surveys  for  a  random  sample  of  320  individuals  to  allow  for  a  95  percent 
confidence  level. 

Procedure 

Surveys  were  mailed  to  officers  at  their  duty  addresses,  accompanied  by  a 
cover  letter  signed  by  the  Dean  of  the  School  of  Systems  and  Logistics,  Air 
Force  Institute  of  Technology  (AFIT/LS).  Participation  was  voluntary,  and 
respondents  were  assured  of  anonymity.  Respondents  were  asked  to  complete  and 
return  an  optical  scanning  sheet  for  the  standard  survey  items.  Space  for 
comments  and  suggestions  was  provided  at  several  points  in  the  questionnaire 
booklet.  Respondents  were  instructed  to  return  all  materials  in  a  postage- 
paid  return  envelope. 

Measures 

The  MBS,  provided  by  Ur.  Gary  A.  Yukl  of  the  Business  School,  State 
University  of  New  York  at  Albany,  was  used  to  assess  subordinate  perceptions 
of  their  superior  officer's  leader  behavior.  This  instrument  measures  the 
frequency  of  130  leader  behaviors  (ten  items  in  each  of  thirteen  categories). 

The  rest  of  the  survey  was  composed  of  items  written  especially  for  this 
research  effort.  It  included:  (a)  standard  demographic  information,  including 
sex,  age,  source  of  commission,  rank,  major  command,  prior  service,  and 
organization/level  (7  items);  (b)  perceived  extent  of  leadership  development 
(4  items);  (c)  immediate  superior's  leadership  effectiveness  (1  item); 

(d)  perceived  importance  of  leadership  development  activities  (18  items);  and 

(e)  extent  of  involvement  in  leadership  development  activities  (15  items). 

Items  on  PME  completed  by  different  methods  were  grouped  together  in  section  1 

e,  producing  a  smaller  number  of  items  than  in  section  d.  I 


Resul ts 


Respondent  Profile 

From  the  random  sample  of  320  officers,  185  usable  surveys  were  returned, 
for  a  return  rate  of  57.8%.  This  was  considered  reasonably  good  since  the 
survey  package  contained  27  pages  and  190  items.  The  sample  was  predominantly 
male  (84%)  with  the  median  age  category  being  30-34  years.  Most  received  their 
commission  from  Officer  Training  School  (58%)  vice  R0TC  (37%)  or  USAFA  (4%). 
Consistent  with  this  pattern,  the  majority  had  prior  enlisted  service  (56%). 
Almost  half  (48%)  were  captains,  with  about  an  equal  split  among  the  rest 
between  second  lieutenant  (26%)  and  first  lieutenant  (25%).  Half  were 
assigned  to  TAC  (51%)  with  the  rest  divided  almost  equally  between  SAC  (25%) 
and  MAC  (24%).  The  largest  number  were  assigned  to  Organizational  Maintenance 
or  Aircraft  Generation  Squadrons  (37%);  almost  two  thirds  (65%)  were  assigned 
to  the  unit  level  versus  the  DCM  staff  or  a  higher  headquarters. 

Leadership  Development  Activities 

Respondents  were  asked  to  rate  the  perceived  importance  of  leadership 
development  activities  on  a  five-point  scale  ranging  from  "not  important"  (1) 
to  "extremely  important"  (5).  They  were  later  asked  to  indicate  their 
completion  of  certain  activities  (e.g..  Squadron  Officer  School)  or  extent  of 
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involvement  in  other  ongoing  endeavors  (e.g.,  number  of  hours  per  week  spent 
in  personal  study  of  leadership).  Results  are  summarized  in  Table  1. 


Table  i 

Leadership  Development  Activities 
m  AN 

IMPORT ANt  E  !NV0LV£M[ NT 


ACTIVITY 

ra T ! NO 

(Average  or  Percent) 

Leadership  of  NCOs 

4.21 

o.60 

T  imeS/Week 

Observation  of  Superiors 

<1.  IS 

4.19 

T  imes/Week 

TUY  Deployments 

1.  IS 

3. 03 

Weeks/Year 

Leadersnip  of  Airmen 

4.  OH 

6.  bS 

1  nties/Week 

Leddersnip  of  Peers 

1.91 

4.0/ 

T  lines /Week 

AC  SC  in  Residence 

i .  Ob 

0.0X 

Have  Attended 

Other  Leadership  Activities 

! .  jg 

? .  23 

Hours/Week 

AC  SC  by  Seminar 

1.1? 

8.  bX 

Have  Completed 

Uraduate  Degree 

1.0/ 

29.  /X 

Have  Completed 

Personal  leadership  Study 

c.yo 

i./J 

Hours/Week 

ACSC  by  Coi respondence 

?./a 

s.n 

Have  Completed 

Sports  leadership 

,\/i 

1.27 

Hours/Ueek 

Other  AF -Related  Activities 

.\/J 

0.6/ 

Hours /Ween 

Community  leadership 

,\/l 

1.02 

Hours/Week 

Professional  Org.  Leadership 

?.  60 

1.02 

Hours/Week 

Church  Leadership 

2,66 

0.b9 

Hours/Week 

SOS  in  Residence 

? .  t>  1 

34. 6% 

Have  Attended 

LPDP 

2 .  !>  3 

lb.  IX 

Have  Completed 

SOS  by  Correspondence 

2.29 

34.  OX 

Have  Completed 

The  most  important  activities  to  the  AMOs  were  their  working  experiences 
with  NCOs,  airmen,  superior  officers  and  peers,  and  on  TDY  deployments.  The 
least  important  in  their  eyes  were  the  formal  programs  which  are  typically 
emphasized  in  official  circles.  These  included  the  Lieutenants  Professional 
Development  Program  (LPDP)  and  PME,  especially  by  correspondence. 

Chi-square  tests  were  employed  to  determine  if  any  significant 
relationships  existed  between  the  rated  importance  of  leadership  development 
activities  and  the  extent  of  involvement  in  those  activities.  Eight  of  the 
nineteen  leadership  development  activities  were  found  to  have  a  statistically 
dependent  relationship  at  the  .05  level  of  significance.  They  were: 
leadership  of  NCOs,  TOY  deployments,  other  leadership  activities,  graduate 
degree,  personal  leadership  study,  other  Air  Force  related  activities, 
professional  organization  leadership,  and  church  leadership.  These  activities 
spanned  the  spectrum  of  involvement  ratings  and  level  of  involvement,  so  no 
pattern  was  obvious.  The  findings  could  be  interpreted  to  say  that  the 
officers  perform  a  given  activity  because  they  perceive  its  importance  in 
developing  their  leadership  ability  or  that  the  activity  is  rated  as  being 
important  simply  because  the  officers  are  involved  in  it. 

Mana^enal  Behavior  Survey  Results 

Before  using  the  MBS  results,  internal  consistency  reliabilities  were 
computed  for  each  of  the  thirteen  scales;  all  were  .86  or  greater.  Also, 
means  and  standard  deviations  of  the  scale  variables  (formed  by  summing  the 
ten  item  scores  for  each  scale)  were  computed  to  provide  a  picture  of  the 
average  leader  behavior  of  the  superior  officers.  Results  are  summarized  in 
Table  ?.  Jf  interest  in  the  present  context  was  the  fact  that  "developing" 
behavior  was  one  of  the  lowest  rated  categories. 
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lu.iJur  ji-hj.iur  Sv  a  1  «* 

jjj  items  u.u.lM _  _ 

Su|)|>ui  t  Illy 

Problem  Solving  4  Crisis  Mgt. 
Inter! ar nig 
Represent ing 

[  nfoiiii  mq 

Consulting  4  Delegating 
Honit  iring  Operat ions 
Hormoni'  lug  4  TeJin-Bu  1 10  inq 
Rerogni/mo  4  KtM.ir.J  m.j 
PI, inn  iny  4  Organizing 
Motivating  lask  Comm  Client 
Developing 

clarifying  Roles  4  Objectives 


Ci  o'll.  i.  It  '  s 

St  dllJoi  J 

<"•  .ill,' 

M-jn 

Oiw  :  at  ion 

90 

*!  .  CO 

.  7^o 

.  vf. 

?.  78 

.03  7 

.9.’ 

2.68 

.669 

.92 

?.  b/ 

,70b 

..-.9 

C.o4 

.684 

.SI 

'.68 

.  /Ob 

.’.SI 

.666 

7.49 

.  74/ 

.9/ 

V.4ti 

.7/0 

.9! 

.'.40 

.  76b 

>.y 

'.34 

.731 

.9? 

2.27 

.797 

.90 

2.2 1 

.766 

Next,  a  set  of  correlations  were  computed  between  the  AMO's  perceived 
leadership  development  and  the  reported  leader  behavior  of  the  superior 
officer.  The  hypothesis  was  that  some  types  of  leader  behavior  (e.g., 
developing,  consulting  and  delegating,  recognizing  and  rewarding)  would  be 
positively  related  to  the  subordinate's  leadership  development.  In  fact,  all 
of  the  correlations  between  the  MBS  scales  and  self-rated  leadership 
development  were  negative.  Nine  of  the  thirteen  were  statistically 
significant  at  the  .05  level;  however,  the  largest  negative  correlation  was 
only  -.19  (for  "supporting"  behavior). 

Personal  Determinants _of  _Lea_o_e_r_shi_p_  Development 


The  final  major  analysis  was  accomplished  to  determine  if  there  were  any 
relationships  between  the  demographic  items  and  leadership  development  by  the 
AMO.  A  one-way  ANOVA  was  used  for  each  of  the  categorical  variables  with 
perceived  leadership  development  as  the  dependent  variable.  The  ANOVA  for 
rank  resulted  in  significant  differences  between  group  means  ( p< . 02 ) ,  while 
there  was  a  marginally  significant  difference  for  prior  enlisted  service 
( p< . 09 ) .  In  both  cases,  greater  experience  (as  indicated  by  higher  rank  and 
prior  service)  was  associated  with  greater  leadership  development. 


Discussion 

Our  major  conclusion  is  that  experience  is  the  best  form  of  leadership 
development.  The  AMOs  in  this  study  "placed  the  highest  importance  on  their 
working  relationships  with  airmen,  NCOs,  and  peers,  their  observation  of 
superior  officers,  and  their  experiences  on  TOY  deployments.  Furthermore, 
these  were  the  activities  in  which  they  reported  the  greatest  involvement. 
These  results  were  also  supported  by  demographic  data  which  indicated  greater 
perceived  leadership  development  for  those  having  higher  rank  and  those  having 
prior  enlisted  service.  It  would  appear  that  the  old  adage,  "there's  no 
substitute  for  experience",  is  indeed  true  when  it  comes  to  leadership. 

A  striking  result  on  the  opposite  end  of  the  spectrum  was  the 
comparat i vely  low  ratings  given  to  formal  development  programs  such  as  LPDP 
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and  PME,  esoecially  for  the  correspondence  versions.  These  trends  were 
supported  by  many  of  the  written  co-men  ts  wnich  emphasized  the  need  for 
experiential  training. 

Even  other  varieties  of  leadership  experience,  e.g.  in  the  communi ty, 
churches  and  professional  organizations,  received  relatively  low  ratings 
versos  those  categories  which  were  specific  to  the  job.  These  trends  would 
seem  to  support  the  now  well-accepted  situational  aspect  of  leadership. 
Experience,  to  be  most  effective,  should  be  specific  to  the  job  environment. 

A  surprising  outcome  was  the  negative  correlations  observed  between  the 
superior's  leadership  behavior  and  leadership  development  of  the  subordinate. 
It  was  expected  that  at  least  some  forms  of  leader  behavior  would  enhance  the 
AMO's  leadership  development.  The  negative  relationship  observed  could  be 
explained  in  at  least  two  ways.  They  could  indicate  that  if  a  junior  officer 
has  developed  well  as  a  leader,  then  the  superior  does  less  leading  and  allows 
the  junior  officer  to  lead.  Another  possible  explanation  is  that  the  more  the 
AM Os  feel  tney  have  developed  as  leader,  the  more  critical  they  are  of  the 
superior's  loader  behavior. 

This  research  focused  on  rather  global  issues  of  leadership  development, 
specifically  in  the  aircraft  maintenance  career  field.  Much  has  been  learned; 
however,  many  questions  remain.  Future  research  should  further  examine  the 
relative  worth  of  different  methods  of  leadership  development;  such  trends 
should  be  examined  in  various  settings  with  different  groups  of  leaders.  A 
criterion  ns a sure  more  objective  than  self-rated  leadership  development  would 
be  particularly  jseful  in  follow-on  work. 


References 

Benton,  J.D.  (1931).  Prqmoti_n£  leader  ship  in  the  Air  Force's  management 
environment.  Unpublished  research  report,  No.  0230-81,  Air  Command  and  Staff 
CoTlege,  Maxwell  AFB,  AL. 

Bowers,  D.G.,  &  Seashore,  S.E.  (1966).  Predicting  organizational  effectiveness 
with  a  four-factor  theory  of  leadership.  Administrative  Science  Quarterly, 
11,  238-263.  . . .  . 

Dob i as,  L.J.  (19/4).  A_n  _an_alj/sis_  of  management  development  in  the  Air  Force. 
Unpublished  research  report,  No.  0785-74 ,  A'ir  Command  and  Staff  College, 
Maxwell  AFB,  AL. 

Gosnell,  W .  I. .  (  1980).  Th e  _Ai_r  Fjqrce  i_s  m  ak_i_n g_  _o_cc_uja_a  tionaj lj_st_s  _of  its  junior 
office_rs.  Unpublished  research  report,  No.  MS07 1  -80 ,  Air  War  College, 

Maxwell  AFB,  AL. 

Halpin,  A.W.,  l  Winer,  8.J.  (1957).  A  factorial  study  of  the  leader  behavior 
descriptions.  In  Stogdill,  R.M.,  &  Coons,  A.E.  (Eds.),  Leader  behavior:  Its 
Rescript i_o_n_ _a_nd  measurement  (pp.  39-51).  Columbus:  Bureau  of  Bus i'n'esY ' 
ReseaVch,  Ohio  State  University. 

Likert,  R.  (1961).  New  patterns  of  management .  New  York:  McGraw-Hill. 

Robinson,  G.0.  ( 1 9 74 ) .  School ing  the  middle  manager.  Unpublished  research 
report,  No.  5406,  Air  War  College,  Maxwell  AFB",  AL. 

Shriver,  E.L.,  and  others.  (1980).  Development  of  a  leader  training  model  and 
system.  Unpublished  research  report,  AD-A082  730 ,  Army  Research  Institute 
for  the  Behavioral  and  Social  Sciences,  Alexandria,  VA. 

Wei  1  ins,  R.S.,  and  others.  (1980).  Analysis  of  junior  officer  training  needs. 
Unpublished  research  report,  AD-A096  034",  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences,  Alexandria,  VA. 

Yukl,  G. A.  (1931).  Leadership  in  organi zations.  Englewood  Cliffs,  NJ: 
Prentice-Hal 1 . 

123 


is 


Performance  of  students  using  AMBTS  is  being  compared  to 
that  of  students  receiving  the  traditional  instruction. 
Additionally,  a  sample  of  students  will  be  followed  into  the 
fleet  to  determine  if  performance  differences  persist  in  the 
j ob  env 1 ronment . 

AMBTS  in  Pierside  and  Reserve  Training 

The  second  initiative  grew  out  of  the  technology  of  the 
first,  but  focuses  upon  the  pierside  and  reserve  community 
components  of  the  training  continuum.  The  issues  here  are 
not  the  effectiveness  and  efficiency  of  initial  skill 
training  but  rather  maintenance  of  skills,  recertification 
of  individuals  returning  to  sea  after  protracted  shore 
assignments,  and  qualification  of  teams  whose  players  must 
learn  to  coordinate  their  individual  skills  to  maximize  team 
efforts.  The  fact  that  the  Reserves  must  accomplish  this 
training  at  remote  sites  exacerbates  these  problems  for 
them . 

AMBTS  is  the  medium  for  accomplishing  the  training;  however, 
a  method  for  assessing  individual  performance  and 
prescribing  follow-on  or  refresher  training  must  be  added  to 
the  capability  of  the  basic  program.  Through  this 
application  of  the  technology,  we  hope  to  learn  how  to  apply 
what  is  known  about  the  retention  of  learning.  This  will 
heip  us  design  acceptability  strategies  which  can  be  applied 
in  the  fleet  and  the  reserve  community  where  there  is  a  good 
deal  of  skepticism  about  technology's  value  as  an 
instructional  tool. 

Training  Continuum  Strategies 

The  third  initiative  is  in  an  entirely  different  training 
arena,  but  builds  upon  the  training  concepts  of  the  second 
initiative.  In  this  case,  the  implementation  model  focuses 
not  upon  the  learning  and  maintaining  of  technical  skills 
but  on  the  educational  requirements  of  the  officer  accession 
continuum.  We  have  selected  an  i ns  true t i ona 1  continuum  that 
begins  with  a  Navy  school  preparing  potential  NROTC  students 
for  the  college  environment,  to  the  NROTC  itself,  to  the 
follow-on  warfare  schools,  and,  ultimately,  to  the  fleet. 

The  process  for  making  this  continuum  operate  more 
effectively  includes;  (1)  assessing  individual 
capabilities  at  each  school  level  against  required 
developmental  and  professional  skills;  (2)  intervening  at 
each  point  along  the  way  to  close  deficiency  gaps  through 
the  application  of  computer-based  modules  and  other  good 
support  practices  (including  peer  tutoring);  and,  (3) 
providing  follow-on  assessment  and  deficiency  prescriptions 
to  the  next  schoolhouse  which  can  then  continue  the 
remediat ion/development  process. 

Through  this  interaction  among  those  responsible  for 
training  along  the  continuum,  we  seek  to  add  a  " f eed f orward" 
as  well  as  a  feedback  process  that  establishes  an 
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accountability  for  value  added  at  each  follow-on  school.  If 
we  are  successful  at  packaging  these  techniques,  we  will 
have  in  place  the  basic  ingredients  of  a  true  training 
continuum  which  should  become  the  basis  for  all  Navy 
training  in  the  future.  Perhaps  we  can  then,  at  last, 
reduce  the  impact  that  manpower  and  personnel  policies  have 
upon  training  performance  and  concentrate  upon  helping 
people  learn  what  they  need  to  know  --  without  wasting  time 
on  b i ame . 

TT!  has  other  initiatives  underway.  These  include  the 
i den t i f i ca t i on  of  applications  for  smoke  generators  in 
Reserve  training,  the  employment  of  artificially  generated 
signals  in  gram  analysis  training  in  Navy  settings,  and  the 
development  and  implementation  of  a  Navy-related  functional 
skills  math  and  reading  curriculum. 

The  purpose  of  all  these  early  initiatives  is  to  package 
existing  technologies  and  instructional  materials  into 
meaningful  i ns t rue t 1 ona 1  delivery  systems.  These  systems 
are  tailored  to  the  unique  instructional  environments  along 
the  continuum  and  enjoy  the  credibility  and  acceptability  of 
schoolhouse  instructors  and  fleet  supervisors.  As  we 
continue  to  improve  this  packaging  process  and  couple  it 
with  such  delivery  concepts  as  requa 1 i f i cat i on ,  we  come 
closer  to  exploiting  our  ability  to  intervene  in  the 
capability  of  people  to  perform.  We  can  then  allocate  our 
resources  on  continually  assessing,  building  upon,  and 
sustaining  skill  development  as  the  cornerstone  for  fleet 
read i ness . 


IMPLEMENTATION  LESSONS 

During  planning,  initiating,  and  executing  several 
implementation  projects,  TTI  encountered  recurring  issues 
and  problems.  By  examining  and  trying  to  solve  them,  we 
uncovered  lessons  that  should  be  heeded  during  systematic 
implementation.  These  lessons  are  consistently  found  and 
persistent  enough  that  even  the  obvious  ones  should  be 
discussed  and  all  should  be  incorporated  into  overall 
implementation  models.  Attending  to  them  leads  to  nothing 
more  than  good  implementation  practices, 

1.  The  difficulty  of  i mp i emen t i ng  technology  is 
continually  underestimated. 

This  is  a  self-evident  truth,  but  one  that  is  generally 
ignored.  Researchers  and  operators  in  both  the  military  and 
civilian  communities  continually  underestimate  the 
difficulty  of  implementing  technology.  Researchers  and 
operators  have  different  goals  and  interests;  therefore, 
just  recognizing  that  the  process  is  not  likely  to  solve  the 
problem. 

Perhaps  this  is  why  we  are  seeing  a  new  breed  of 
professional;  and  the  lesson  is  we  must  encourage  the 
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development  of  these  people  --  people  who  can  stand, 
however  shakily,  with  one  foot  in  the  the  research 
laboratory  and  the  other  in  the  operational  environment. 

These  specialists,  employing  these  emerging  implementation 
skills,  can  bring  the  transitioning  process  to  a  level  of 
sophistication  which  will  build  a  climate  of  operator 
acceptance  and  ownership.  Maintaining  this  climate 
throughout  dissemination,  will  best  contribute  to  the 
sustainability  of  new  training  technology  and  techniques  in 
the  cl assroom . 

2.  The  capability  of  technology  to  improve  training 
should  not  be  exaggerated. 

This  second  lesson  highlights  a  principle  which  has  long 
beer,  discussed  with  regard  to  innovation  but  is  often 
overlooked  in  practice.  The  training  equipment  that  can  be 
uncovered  in  military  salvage  depots  is  testimony  to  our 
willingness  to  embrace  new  technologies  as  panaceas  end  our 
subsequent  disillusionment  with  them. 

Similarly,  in  our  2eal  to  adhere  to  the  letter  of  our 
i ns t rue t 1 ona 1  systems  design  law  we  have  too  often  confused 
familiarity  with  mastery.  We  have  also  overlooked  the 
impact  of  forgetting  on  initial  performance  in  an 
operational  setting.  It  is  now  apparent  that  exaggerated 
claims  of  improved  performance,  often  accompanied  by  poorly 
utilized  computer  technology,  helped  to  undermine  the 
credibility  of  those  endeavors  with  operators  who  knew 
better.  This  frequently  resulted  in  animosity  toward  the  I SD 
process  and  the  techniques  and  technologies  it  promoted. 

! 

At  the  beginning  of  every  implementation  endeavor  we  are  j 

well  advised  to  realistically  formulate  promises  and  use  J 

technology  in  supporting  roles  and  as  tools  where  j 

appropriate.  Our  initiatives  at  TT 1  incorporate  this  s 

adv ice.  j 

f 

3.  Goals  determining  hardware  and  software  uses  must  ! 

be  carefully  defined.  I 

it  is  important  to  have  goals  that  determine  hardware  and 
software  uses.  What  we  are  learning,  however,  is  that  we 

must  use  great  care  not  to  structure  our  goals  too  narrowly  J 

around  near  term  cost  and  training  benefits  as  we  perceive  j 

them  which  may,  in  turn,  restrict  our  our  ultimate  | 

flexibility.  I 

As  we  get  better  at  recognizing  technology  linkages,  there  j 

will  be  numerous  opportunities  to  build  one  system  upon  ] 

another.  In  this  way  we  can  expand  the  applications  for  j 

individual  team  and  crew  training  uses  as  well  as  leapfrog  j 

those  applications  from  the  schoolhouse  to  shipboard 

environments.  However,  it  will  be  difficult  to  fully  I 

exploit  these  possibilities  if  we  have  been  overly  J 

restrictive  in  our  early-on  goal  setting  and  front-end 
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analysis  processes.  We  must  improve  our  abilities  to 
perform  trade-off  analysis  and  take  risks. 

4.  Technical  skills  must  be  acquired  in  order  to 
implement  technologies  successfully. 

Trainers  and  implementers  may  have  an  edge  over  others  when 
it  comes  to  analyzing  the  capabilities  of  technologies  to 
improve  training,  but  specific  hardware  decisions  are  often 
out  of  our  realm  of  expertise.  We  need  people  who  know 
about  hardware  con f i gur a t i ons  --  people  who  can  provide 
guidance  in  ordering  equipment  and  helping  us  with  hook-up, 
operating,  and  maintenance  problems. 

The  role  of  these  people  includes  the  development  of  quality 
controls  in  the  acquisition,  use,  and  maintenance  of 
hardware  and  materials,  and  monitoring  vendor  performance. 
There  is  no  reason  to  tolerate  the  shoddy  performance  that 
many  of  us  have  experienced  in  training  systems  technology. 
We  should  use  these  people  to  develop  stringent,  enforceable 
quality  controls  across  the  board. 

POLICY  ISSUES 

Our  early  attempts  to  move  technology  from  research  to 
implementation  brought  two  policy  issues  into  bold  relief. 

If  our  initiatives  are  to  be  optimally  successful  we  must 
deal  with  both  the  issue  of  a  funding  continuum  to  accompany 
the  R&D  continuum  and  with  procurement  issues. 

Funding  Continuum  Issue 

As  an  R&D  product  moves  from  exploratory  research  to 
engineering  development,  to  prototype  demonstration,  to 
pilot  testing,  and,  finally,  to  full-scale  implementation,  a 
systematic  funding  continuum  to  assure  this  orderly  progress 
is  essential  but  currently  lacking.  To  establish  this 
funding  continuum  we  must  begin  by  identifying  clearly 
stated,  broadly  based  umbr e 1 1  a- type  research  categories 
under  which  specific  projects  can  be  grouped.  These 
categories  should  be  derived  primarily  from  senior  officers 
responsible  for  training  who  should  review  the  categories 
periodically  in  order  to  renew  their  commitment  or  change 
them  as  a  result  of  changing  military  priorities. 

Specific  R&D  pi  ejects  will  fall  under  these  umbrella 
categories  where  they  will  be  prioritized  and  appear  as  a 
totally  funded  package  through  the  6.4  funding  level.  This 
pet  Lion  of  the  continuum  must  then  be  coupled  with  follow-on 
operating  dollars,  initially  in  the  form  of  wedges.  As  the 
budget  year  comes  closer,  these  w edges  will  give  way  to  the 
more  specific  dissemination  and  life  cycle  requirements 
emerging  from  the  6.4  accomplishments. 
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Without  this  kind  of  funding  continuity  we  will  never  fully 
i ea! ::e  the  potential  that  research-based  instructional 
techniques  and  technologies  hold  for  improved  performance. 

Procurement  Issues 

Software  and  hardware  procurement  and  maintenance  issues  can 
no  longer  tolerate  policy  neutrality.  In  the  early  stages 
of  transitioning  RSD  from  the  laboratory  to  the  classroom, 
trainers  must  be  formally  involved  in  the  procurement 
process.  There  are  too  many  training  issues  involved  in 
purchasing  decisions  (and  many  quality  control  initiatives) 
to  leave  technology  skilled  trainers  out  of  the  process. 

Further,  we  must  weigh  the  advantages  and  disadvantages  of 
the  cent! a  1 i zed/decentral i zed  issue  in  upgrading  and 
maintaining  software  and  courseware.  To  date,  each 
. mp I emen t a t i on  project  has  found  its  own,  not  always 
satisfactory,  solution  to  this  problem.  We  must  consolidate 
the  knowledge  gained  by  these  projects  and  prepare  ' 
standardized,  supportable  policy  on  this  issue. 

Finally,  policy  matters  associated  with  life  cycle  costing 
greatly  concerns  those  responsible  for  training  in  the 
classroom  and  at  pierside.  How  these  policies  evolve  is 
closely  tied  to  the  credibility  and  acceptability  technology 
applications  will  enjoy  among  our  military  colleagues. 

CONCLUSION 

We  in  Navy  training  have  embarked  upon  implementation  in  a 
forma!  way.  We  believe  this  to  be  good,  sensible,  and  even 
wise.  However,  we  suspect  that  our  initiatives,  our 
experience,  and  our  policy  concerns  differ  little  from 
others' . 


The  task  before  us,  then,  is  to  learn  how  to  learn  from  each 
other;  to  share  what  we  know;  and  to  build  on  each  other's 
experiences.  How  well  we  do  these  things  will  determine  how 
well  we  fulfil!  our  goals  in  the  struggle  to  be  more 
effective  in  a  world  of  scarce  resources  where  so  much 
depends  on  human  performance. 
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Research  Needs  Assessment  and  Technology  Transter  in  USAREUR 

Karol  Gtrdler  and  LTC  Fora  I'lcLam 
l.S.  Array  Research  Institute 

In  October  1989.  the  Array  Research  Institute  (AR1 )  Field  Unit  in  the 
L.S.  \riny  F.umpe  ''USAREUR)  ended  its  programmatic  research  activities,  and 
it  was  redesignated  a  Scientitic  Coordination  Oil  ice  (SCO).  The  primary 
mission  ot  the  organization  became  research  needs  assessment  and  technology 
trails!  er.  Technology  means  scientitic  knowledge  m  the  form  ot  hardware, 
processes,  methods  or  ideas.  Here,  technology  transter  refers  to  the 
application  ot  technology  produced  by  ARL  researchers  lor  a  new  user  and  in 
some  cases,  putting  old  results  to  new  use.  Since  one-third  ot  the  U.S. 

\rra>  is  stationed  in  Europe  and  numerous  ARI  technologies  are  developed  for 
use  by  the  active  Army  in  the  rield,  it  is  essential  that  products  be 
transferred  and  needs  be  considered  within  this  setting.  While  identifying 
research  needo  and  promoting  technology  transter,  we  have  identified  gaps  in 
what  should  be  a  comprehens 1 ve  research  and  development  (R&D)  management 
process.  The  purpose  of  this  paper  is  to  clarity  the  need  for  an  R&D 
management  approach  winch  is  accountable  for  research  including  needs 
assessment  anu  product  utilization  with  feedback  from  Che  user. 

Highlighted  here  are  our  conclusions  developed  as  we  searched  for  a 
model  against  which  we  could  test  our  experiences  in  ne«'  is  assessment  and 
with  which  to  guide  and  judge  our  effectiveness  m  diss^.  ainat ing  knowledge 
and  tacilitating  implementation  ot  research  products  in  an  operational 
setting.  In  searching  for  a  model,  we  were,  in  part,  seeking  to  define  the 
role  ot  and  the  support  system  needed  for  those  who  carry  out  these 
tuuctioris  in  the  military  setting. 

Dissemination  ot  research  results,  utilization  and  the  relationship  to 
needs  assessment  are  not  new  ideas.  A  great  deal  has  been  written  on  ways 
to  disseminate  knowledge  anu  get  it  used,  and  about  the  critical  variables, 
barriers  and  gateways  to  be  encountered.  Glaser  (1983)  compiled  an 
extensive  review  ol  research  on  the  topics  and  reviewed  several  models.  The 
bibliography  alone  is  162  pages.  The  demand  for  better  needs  assessment 
analysis  and  product  utilization  in  the  military,  in  particular,  have  been 
addressed  by  military  psychologists  such  as  Freda  (1980)  and  Shields  (1977). 
For  example,  Shields  tound  that  some  technology  is  under-utilized  or  totally 
rejected  by  the  intended  users  because  the  need  tor  the  product  did  not 
exist,  the  product  was  flawed  from  a  user  standpoint,  or  the  researcher  or 
developer  did  not  attend  to  the  complete  process  ot  R&D  management. 

What  is  new  heie  are  our  discoveries  ot  what  is  possible  now,  and  what 
t  ci r i he r  support  is  required  as  we  perform  these  functions  in  an  operational 
environment.  Also  unique  is  that  we  may  be  the  only  unit  in  the  R&D 
community  with  these  functions  as  the  mam  mission  ot  the  organization. 


Ibis  paper  is  an  unoiticial  document  intended  tor  limited  distribution  to 
obtain  comments.  The  views,  opinions,  and/oi  findings  contained  in  this 
document  are  those  oi  the  authors  and  should  not  be  construed  as  the 
official  position  of  ART  or  as  au  otficial  Department  oi  the  Army  position, 
policy,  or  decision. 
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We  have  speculated  and  forayed  into  our  environment  with  a  number  of 
simultaneous  actions  in  accomplishing  our  mission.  We  have  had  successes, 
and  a:  all  times  we  have  come  back  to  the  central  question  of  how  our 
activities  do  tit  or  should  fit  into  a  system  of  R&D  management  and  a  model 
or  knowledge  utilization.  Our  aim  is  to  ao  more  than  "muddle  through"  with 
occasional  immediate  successes;  it  is  to  influence  the  further 
institutionalization  or  our  activities  and  provide  wider  support  for  such 
action  throughout  the  military  and  the  R&D  community.  Within  the  military, 
centralized  management  ot  R&D  may  be  too  big  to  handle;  but,  a  change  in  the 
sequencing  of  and  habitual  relationships  within  the  R&D  community  and 
between  R&D  people  aim  users  still  needs  to  be  considered. 

One  way  to  provide  mst 1 t ut lonal izat ion  ot  R&D  management  tor  this 
purpose  is  on  a  case-by-case  basis,  that  is,  a  case  management  system  for 
i esearch  products.  11  such  longitudinal  management  existed  in  a  clearer, 
more  concise  form,  to  include  greater  involvement  with  the  users,  we  would 
be  in  a  better  position  than  we  are  now.  However,  our  experience  suggests 
to  us  that  product  case  management  may  not  be  the  most  effective  procedure. 
Such  a  potentially  regulation-based  system  could  solidify  horizontal 
communication  in  organizations  or  result  in  paperwork  drills.  In  addition, 
the  concentration  on  a  product-based  system  could  limit  the  perception  and 
creativity  ot  those  responsible.  The  longitudinal  mindset  of  such  a  system 
could  lead  managers  to  see  R&D  management  as  a  process  where  each  product 
has  a  definite  start  and  end  unconnected  to  other  products  a.id  issues. 

Rather  than  a  product  case  management  process,  the  ultimate  support  system 
tor  scientists  engaged  in  these  activities  would  recognize  research  needs 
assessment  and  technology  transfer  as  two  sides  of  one  coin,  the  activities 
occurring  in  a  cycle  and  occurring  simultaneously  for  many  products.  The 
common  element  to  be  managed  in  this  setting  of  multiple  activities  could  be 
not  products,  out  concepts  in  relationship  to  central  issues. 

We  are  interested  in  insi itut  lona 1  suppoil  lor  the  scientist  operating 
in  the  role  we  are  defining  -  one  who  deals  simultaneously  with  a  number  of 
user  needs  and  research  products  and  seeks  to  bring  products  to  the  intended 
uset ,  as  well  as  others  who  could  benefit  from  the  knowledge.  At  the  same 
tune,  miormalion  is  transmitted  about  new  users  and  new  needs  to  those 
scientists  producing  technology.  These  needs  might  be  answered  with  new 
research  ot  existing  products  which  can  be  implemented.  This  in  turn  may 
suggest  new  needs  relevant  to  the  prouuct  implemented,  as  well  as  to  other 
research,  so  that  the  lull  cycle  has  occuired.  The  cycle  is  not  performed 
for  one  product  and  one  need.  The  scientist  may  have  several  needs  or  sets 
ot  needs  in  mind  tor  various  organi zat ions ,  or  several  possible  products  and 
be  at  various  stages  ot  the  cycle  in  different  settings.  R&D  management 
processes  could  create  a  support  base  tor  performing  these  activities.  The 
research  process  which  receives  t he  mast  support  end  management  now  is  the 
normal  scientific  method  or  process  ot  problem  review,  hypotheses 
generation,  etc.  The  role  being  described  here  ot  making  broader  linkages 
and  connections  la^es  us  to  areas  which  are  previous  t.o  problem  review  and 
which  rollow  research  project  completion,  and  even  follow  product 
ut  1 1  izat ion . 

I'he  continuing  questions  aie:  flow  do  we  organize  support  for  the 
scientist  in  this  role?  Who  in  effect  is  a  research  manager?  flow  do  we 
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organize:  the  woi  %  uselr  to  seep  it  flexible  an.l  a 'Old  bureaucracy  winch 
cmila  sttile  communication  needed  to  identity  new  uses  tor  technology?  The 
role  ib  that  oi  a  gatekeeper  ol  eat  inlibiied  chaunt  lb  o;  information 
exchange,  ana  oi  a  gateway  t.eal  >i  who  nuibi  a.icertai.  tocal  points  where 
"gateb"  can  be  treated  between  the  huge  systems  oi  the  military  and  the  R&D 
community,  ’where  are  the  interlaces  between  t lit  two  systems  most  permeable 
ana  tacilitation  most  luce!/  to  al,iV;  ml  ormat  ton  to  be  passed  and  used?  In 
this  age  at  int ormat 1  on,  Lt  might  require  only  a  lew  people  to  manage  these 
aspects  oi  tiie  R&l)  process  given  access  to  the  right  tools  and  the  proper 
vantage  point  or  perspective.  Tne  individual  m  this  role  needs  access  to 
the  entire  RSI)  community  so  asoist  m  managing  research  in  this  sense.  The 
database  or  information  network  to  support  such  worn  needs  to  include  not 
onl;  technical  intormat loh  about  current  products  and  their  development,  but 
1 u to  raj t ion  about  issues  anu  trends  ot  primary  importance  to  the  military 
ana  information  about  new  concepts  ot  technology. 

A  lew  examples  ma;  he  1  p  show  the  (eider  where  we  see  opportunities  to 
organize  successtul  exchanges  or  intormat ion,  to  lacilitate  the  process,  and 
suggest  management  needed  toi  a  support  base.  Keep  in  mind  that'  exchange 
.leans  intimation  passed  both  ways.  In  ettect,  the  scientist  involved  in 
tins  aspect  ot  research  lias  two  major  groups  oi  "clients:”  The  R&D 
community  and  the  military  user. 

There  are  a  number  ot  ways  new  research  needs  are  identified.  As  noted 
anove,  aeed  idem  it icac ion  is  often  pertormed  simultaneously  with 
i  upleiseiit.il  ion  or  intormat  ion  dissemination.  At  our  oi  L  ice  in  particular, 
one  torin  this  multiple  operation  takes  is  what  we  call  Technical  Advisory 
Service  (TAS).  In  this  function,  military  users  make  requests  for  our 
assistance  based  on  meir  perceived  operational  needs,  and  their  knowledge 
oi  our  expertise  which  we  have  made  known  m  the  community. 

Out  l  u'st  example  concerns  lAt>  being  carried  out  at  the  Warrior 
Preparation  Center  (WPC).  The  WPG  is  a  joint  operation  by  USAREUR  and  the 
United  States  Air  Force  in  Europe  (USAFL)  to  support  command  and  battle 
stall  training.  The  ceotei  provides  training  for  the  operational  level  of 
welfare  lor  corps  tlr  ough  Army  Croups  m  joint  exercises  with  the  Allied 
faclical  Air  Force  (ATAF).  Our  organic  i. ion  had  oitginally  approached  WPC 
to  possibly  assess  needs  at  th.s  new  center  wind.  might  be  addressed  by  ARI 
research.  Alter  t  lie  WPC  stai  t  membe.s  became  Similiar  with  the  SCO  mission, 
i unctions,  and  aieas  ot  expertise,  they  began  to  identity  operational  and 
developmental  requirements  which  could  be  addressed  by  the  ARI  SCO. 

in  tbi.)  .  use*  ,  eiiau!  i  sn  i  ng  s  u  i  a<  lonsh  1 1  and  laci  lit  a  ting  discussions 
with  wPC  resulted  .n  ueneiits  to  both  orgaulza.  tons .  We  have  been  able  to 
prov.de  short  term  TAS  yielding  ideas  and  answers  lor  WPC  operational 
development,  specifically  m  terms  or  data  collection  and  data  analysis. 

The  work  was  done  by  one  .)  psychologist  slid  by  one  scientist  "loaned"  from 
ARI  HO  lot  hi  intensive  six  week  consultation  to  WPC.  tty  applying  our 
consultation  skills  a'  :’C  anu  other  organizations,  we  build  credibility  tor 
Akl  m  bhAKr.bR  and  la,  ..e  foundation  lot  .»  positive  reception  ot  future 
introdin  t  ion  ol  ARI  techno  logy ,  and  we  create  suppoi I  toi  l he  process  ot  our 
work.  The  PC  organization  helps  us  do  oui  job  ot  idem  i  tying  research 
leads  as  they  coni  nine  a  dialogue  with  oui  oil  ice,  in  which  they  describe 
caui  rises,  issues  ol  miasm  ement ,  and  per  t  o  nuance-  at  corp.->  and  echelon  above 


corps  level.  In  the  tuture,  we  believe  we  might  be  able  to  introduce  to  WPC 
results  from  ongoing  ARI  research  on  corps  per torraance  and  use  feedback  from 
WPC  as  guidance  for  ruriher  research  and  development.  The  lesson  learned  is 
tuat  consultation  skills  and  establishment  ot  networks  or  relationships 
provide  the  basis  lor  our  role. 

Our  second  example  concerns  training  technology  being  developed  by  ARI 
tor  the  Army  Training  Battle  Simulation  System  (ARTBASS),  which  is  to  be 
implemented  as  the  Array's  primary  system  tor  training  battalion  command 
groups  m  the  command  and  control  ot  combat  operations.  ARTBASS  uses  a 
computer  to  simulate  unit  actions  and  weapons  effects,  and  to  calculate  the 
changing  status  or  personnel,  equipment,  ammunition  and  fuel  in  simulated 
combat.  The  system  is  portable,  and  the  Department  of  the  Army  plans  to 
rield  enough  systems  to  train  every  maneuver  battalion  in  the  active  Army 
and  every  Reserve  component. 

In  our  larger  ettorts  to  clarity  issues  and  needs  in  command  and 
control  below  corps  in  USAREUR,  we  attempted  to  ascertain  who  the  users  for 
ARTBASS  training  technology  are  in  USAREUR  and  how  the  training  technology 
being  developed  by  ARI  might  be  fielded  along  with  the  new  system  it 
supports.  Previous  discussions  with  military  users  in  the  field  had  led  us 
to  identiry  an  important  training  issue  in  USAREUR:  Supportive  training 
products  rrequeutly  arrive  much  later,  ii  ever,  than  new  training  systems  or 
new  equipment.  The  need  to  make  connections  was  clear.  However,  our 
experiences  in  the  transfer  ot  training  technology  for  the  ARTBASS  raised 
more  issues  than  answers  about  the  R&D  process  vis-a-vis  our  role. 

Our  attempts  to  make  linkages  in  USAREUR  and  to  have  prototype  ARI 
training  products  for  the  ARTBASS  examined  were  we.1 1-rece ived  by  the  7th 
Array  Training  Command  ( 7ATC )  in  USAREUR,  the  primarily  responsible 
organization  tor  implementation  ot  ARTBASS.  likewise,  the  Army  Materiel 
Command  Europe  (AMC )  encouraged  our  coordination  with  them  in  their  role  in 
fielding  the  system.  Our  informal  information  exchange  and  coordination 
appeared  to  be  holding  up  well  within  USAREUR;  however,  a  subsequent  meeting 
with  a  representative  from  the  U.S.  Program  Manager  for  Training  Devices  (PM 
TRADE)  office  left  us  with  questions  about  our  role.  The  representative 
from  t lie  agency  primarily  responsible  for  overseeing  development  and  initial 
implementation  ot  ARTBASS  was  unaware  of  who  ARI  was  and  why  coordination  in 
the  USAREUR  environment  was  desirable.  Coordination  tor  fielding  of 
training  technology  developed  by  ARI  was  not  on  the  list  ot  "things  to  do” 
when  fielding  new  devices.  Formally,  at  least  to  this  representative,  ARI 
was  not  considered  as  a  contributor  to  the  overall  project.  When  we  were 
questioned  about  how  centralized  coordination  ot  training  technology  and  new 
devices  was  done  in  the  States  before  fielding  to  USAREUR,  we  had  to  respond 
that  we  did  not  know.  The  representative's  proposed  efforts  to  rectify  the 
lack  oL  coordination  were  in  terms  of  a  centralized,  one  product  or  one  case 
fix,  rather  than  a  systems  approach.  Our  coordination  with  the  U.S. 
representative  virtually  came  to  a  halt. 

At  this  point  we  are  left  wondering.  We  have  been  told  numerous  times 
that  new  devices  and  equipment  often  arrive  in  USAREUR  without  programs  of 
instruction  or  other  types  ol  supportive  training  materials.  At  the  same 
time,  we  were  often  able  to  point  to  t  inished  ARI  products  which  could 
assist  in  these  situations.  It  fell  to  the  SCO  in  Europe  to  pursue  the 
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connections.  We  have  not  yet  answered  the  questions.  Should  the 
coordination  role  be  assumed  by  a  central  R&D  management  level?  Are  many 
decentralized  et torts  necessary?  When  and  with  whom?  We  continue  to  build 
our  linkages  in  Europe  and  pursue  the  expansion  ot  linkages  in  the  entire 
R&D  community.  It  is  possible,  however,  that  we  will  rinu  ourselves  outside 
the  system  as  we  create  our  nuormal  network. 

Our  third  example  concerns  a  product  tor  which  our  lateral 
communications  have  idem  it  ted  potential  new  users,  and  tor  which  we  are 
assisting  in  a  test  and  evaluation  process  to  boost  user  involvement  in  the 
product's  development.  The  success  we  have  in  making  connections  is  based 
largely  on  the  tact  that  a  new  type  or  program  has  recently  been  initiated 
and  based  at  HQ  ARI  to  provide  research  product  management  and  which 
provides  some  formal  mechanisms  to  support  a  role  such  as  we  are  developing. 

The  Joint  Service  R6D  Program  sponsored  by  1/)D  was  initiated  to  otter 
end  users  the  capability  to  participate  1a  the  development  ot  prototype 
technologies.  One  product  trom  this  program  under  development  jointly  by 
the  Navy  and  Army  is  the  Personal  Electronic  Aid  tor  Maintenance  (PEAM). 

This  highly  portable,  interactive  device  would  replace  maintenance  manuals 
and  provide  an  aid  to  pertormance  oi  Organizational  Maintenance.  The 
initial  application  is  tor  the  Turret  Mechanic  ot  the  Ml  Abrams  Tank. 

Our  involvement  with  the  Joint  Service  program  combined  with  our 
awareness  ot  needs  and  issues  in  Europe  has  permitted  the  SCO  to  make 
connect  ions  which  may  expand  the  use  ot  PEAM  to  address  a  related  but 
ditterent  need  beyond  its  planned  application.  We  are  currently  involved  in 
evaluating  the  feasibility  of  expanding  the  portable,  step  by  step 
electronic  guide  into  versions  tor  use  by  German  maintenance  mechanics  who 
do  not  use  the  English  language  manuals  well.  In  addition,  other  products 
based  on  the  concept  ot  easy  access  to  procedural  knowledge  might  become 
involved  in  addressing  the  maintenance  issue. 

We  see  the  PEAM  example  as  an  important  aspect  ot  the  total  R&D  process. 
Within  the  greater  R&D  community  when  these  types  ot  connections  are  made, 
they  tend  to  be  as  a  result  ot  personal  ettorts  and  interests.  In  the  case 
ot  PEAM,  the  .Joint  Service  R&D  Program  sets  the  stage  tor  a  broader  view  of 
research  and  development.  Whet’  tins  type  ot  R&D  management  is  combined  with 
the  flexibility  to  share  information  and  communicate  about  an  issue 
laterally  through  military  operational  settings  and  R&D  organizations,  good 
ideas  can  possibly  surface  more  easily  and  become  reality. 

Our  linal  example  concerns  our  activities  at  /ATC  where  mtormal 
linkages  built  through  our  use  oi  consultant  skills  has  created  gateways 
through  which  1  unshed  ARI  products  have  been  delivered  to  users. 

7ATC.  is  responsible  tor  planning,  developing,  managing,  and 
coordinating  training  requirements  and  programs  tor  USAREUR.  An  initial 
visit  wliIi  7A TC  staif  members  showed  that  they  were  generally  unaware  ot 
ARI,  and  yet  very  interested  in  the  type  inlormaLiou  ARE  was  capable  of 
providing.  Alter  our  learning  about  7ATC's  manuate  and  operations  from 
statt  members  and  written  materials,  we  briefed  down  through  the  7ATC  chain 
ot  command,  addressing  what  we  ideutilied  as  their  statt  concerns,  to 
establish  a  formally  recognized  gateway  to  this  organization. 
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We  began  our  work  with  one  directorate  at  a  time,  tirst  establishing 
need&  and  delivering  information,  to  establish  a  "track  record"  within  the 
organization  or  delivering  what  we  promised,  betore  going  to  the  next  area. 
We  currently  play  a  consult ' ng  role,  visiting  7ATC  on  a  routine  basis,  and 
being  called  m  to  address  specitic  problems.  This  allows  tor  lollow-up  ot 
previous  work,  and  ident i t icat ion  of  new  needs.  It  must  be  recognized  that 
there  is  only  limited  organizational  memory  within  7ATC  directorates.  When 
a  start  member  leaves,  our  work  essentially  leaves  with  him.  Likewise,  when 
a  new  start  member  comes  on  board  we  must  br  let  him  on  our  mission  and 
provide  a  package  ot  materials  that  identities  what  has,  and  can  be  done  to 
support  start  members'  job  requirements. 

There  is  still  no  routine  delivery  oi  AR1  ptoducls  at  7ATC  tor  several 
reasons.  The  7ATC  btau  carries  a  very  heavy  workload,  and  does  not  have 
the  luxury  ot  reading  AR1  reports,  even  when  these  are  readily  available, 
we  have  learned  through  experience  that  start  information  needs  must  be 
ilearly  defined,  and  applicaole  AR1  material  discussed  one-on-one  it  the 
information  is  to  be  recognized  and  applied  to  the  7ATC  mission,  and 
eventually  in  OSARFUR.  Using  this  approach,  we  have  been  able  to  identity 
and  address  a  number  ot  information  neeJs  using  past  research  products  and 
current  ARl  draft  reseat ch  materials.  We  have  also  been  able  to  identify 
:uture  products,  both  short-term,  and  long-term,  which  address  7ATC  needs, 
and  nave  provided  these  as  they  were  released.  Based  on  research  product 
ipedback  and  new  operational  needs  at  7ATC,  the  SCO  currently  identities 
needs  tor  mcorporat ion  into  ongoing  ARl  research,  thus  continuing  the  needs 
assessment  -  technology  transfer  cycle. 

In  conclusion,  we  are  looking  at  research  needs  assessment,  information 
dissemination,  and  utilization,  which  comprise  a  scientific  field  ot 
continuing  development  -  one  m  which  a  number  ot  models  have  been  suggested 
but  none  are  detinitive.  We  are  attempting  to  create  not  only  gateways  to 
exchange  information,  but  relationships  which  can  be  accessed  through  the 
gateways  and  through  which  the  user  can  become  a  fuller  part  of  the  R&D 
cycle  to  influence  products  both  betore  and  alter  their  creation.  As  we 
pertorra  the  activities,  we  are  constantly  examining  our  actions  and 
assessing  how  they  tit  or  do  not  tic  smoothly  into  overall  R&D  management. 

We  are  searching  tor  system  solutions  to  tacililaie  our  roles.  It  we  do 
not,  we  will  always  be  treating  the  railitaiy  needs  on  a  band-aid  basis,  and 
when  we,  the  gatekeepets  leave,  the  gates  may  close.  By  defining  our  role 
and  ident Hying  the  types  ot  support  and  credibility  needed  institutionally 
tor  the  role,  we  hope  to  get  consistently  more  use  from  eacli  research  dollar 
on  a  long-term  basis.  The  payoft  is  in  ett active,  user-oriented  research 
management . 
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The  Reduction  of  Standard  Errors  of  Equipercentile  Test  Equating  through 
Negative  Hypergeometric  Presmoothing* 
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1.  INTRODUCTION 

Test  equating  is  the  process  of  finding  which  scores  on  two  or  more  similar  tests 
correspond  to  the  same  level  of  ability  in  the  population  of  potential  examinees.  The  need 
for  test  equating  arises  as  a  result  of  many  considerations.  It  is  often  valuable  to  have 
more  than  one  version  or  form  of  a  test.  When  more  than  one  version  or  form  of  a  test  is 
available,  the  particular  form  taken  by  an  examinee  should  not  affect  the  examinee's 
expected  score. 

The  replacement  of  operational  tests  requires  equating  when  the  scores  on  the  new 
tests  are  to  be  used  in  the  same  predictive  or  evaluative  equations  or  in  the  same  manner 
as  were  the  old  scores.  Test  equating  may  be  carried  out  in  any  of  a  large  number  of 
different  ways,  some  of  which  are  of  recent  origin  and  are  technically  sophisticated,  and 
some  of  which  have  been  in  use  for  several  decades  (see  Holland  &  Rubin,  1982).  This 
report  addresses  only  equipercentile  test  equating  ts  applied  to  two  equivalent  groups 
(Angoff,  1971).  The  terms  "reference  test"  and  "experimental  test"  are  used  to  indicate, 
respectively,  the  test  whose  score  metric  is  to  be  used  for  the  results  of  both  tests,  and 
the  test  whose  score  is  to  be  converted  to  the  units  of  the  other  test.  For  example,  if  an 
existing  test  known  as  Form  K  is  to  be  replaced  by  a  similar  test  Known  as  Form  M,  Form 
K  would  be  the  reference  test  and  Form  \1  would  be  the  experimental  test. 

\s  with  any  procedures  having  the  goal  of  estimating  population  characteristics 
based  on  data  obtained  from  a  sample,  there  are  always  sample-dependent  errors  present 
in  test  equating.  If  an  equipercentile  equating  were  to  be  done  twice  with  similai 
samples,  the  results  would  differ.  The  extent  of  such  differences  has  been  estimated  by 
Lord  (1982)  and  their  magnitudes  appear  as  the  standard  errors  of  equipercentile  equating. 
As  with  all  standard  statistical  procedures,  the  size  of  the  expected  errors  decreases 
linearly  with  the  square  root  of  the  sample  size.  It  is  thus  operationally  impractical  to 
reduce  errors  beyond  a  certain  amount  by  increasing  sample  sizes.  For  example, 
decreasing  the  error  to  one-fourth  the  size  of  the  error  associated  with  a  given  sample 
size  would  require  using  a  sample  16  times  the  Mze  of  the  original  sample.  As  a 
consequence,  practitioners  of  equioercentile  test  equating  have  looked  for  other  ways  to 
reduce  equating  errors.  They  have  most  frequently  used  the  methods  of  smoothing. 

Smoothing 

Two  general  classes  of  smoothing  methods  were  used.  A  third  class  is  made  up  by 
combining  a  smoothing  method  from  the  first  class  with  one  from  the  second  class.  First, 
presmoothing  is  defined  as  the  process  of  smoothing  the  observed  score  frequency 
distributions  prior  to  the  equating.  Second,  postsmoothing  is  defined  as  the  process  of 


♦This  report  is  excerpted  from  a  technical  report  by  B.  Fairbank,  Jr.  titled 
"Equipercentile  Test  Equating:  The  Effects  of  Presmoothing  and  Postsmoothing  on  the 
Magnitude  of  Sample-Dependent  Errors,"  \FHRL-TK-81  64,  Air  Force  Human  Resources 
Laboratory,  Brooks  Air  Force  Base,  Penas,  782'35.  The  research  here  reported  was 
supported  by  a  contract  between  the  Ai>-  Force  Human  Resources  Laboratory  and 
Performance  Metrics,  Inc.  The  opinions  are  those  of  the  author.  The  full  technical  report 
is  available  from  the  author. 
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smoothing  the  e.juipercentile  points  after  equating.  Third,  combined  smoothings  involve 
presmoothing  and  postsmoothing  applied  consecutively.  The  common  intent  of  all  thre> 
smoothing  methods  is  to  remove  small  sample-dependent  fluctuations  from  tlm 
nonsmoothed  equatings  sv  that  the  s  nail  sample  equatings  will  more  nearly  approximate 
the  asymptotic  equatings,  or  those  which  would  result  from  the  use  of  samples  so  large 
that  the  sample-dependent  errors  approach  zero.  The  extent  to  which  the  various 
methods  achieve  this  common  intent  is  mvedigated  by  this  research.  Seven  presmoothing 
methods,  seven  postsmootiung  methods,  and  five  combined  smoothing  methods  were  used 
as  follows: 

A.  Presmoothing  Methods 

!.  1-point  moving  medians 
5 -point  moving  medians 

3.  3  point  moving  weighted  averages 

4.  5- point  moving  weighted  averages 

5.  5-point  moving  weighted  averages  with  root  transformation 

6.  425311  Twice 

7.  negative  hypergeometric 

B.  Postsmoothing  Methods 

1.  linear  regression 

2.  quadratic  regression 

3.  cubic  regression 

4.  orthogonal  regression 

5.  logistic  ogive 

6.  cubic  splines 

7.  5-point  moving  weighted  averages 

C.  Combined  Smoothers 

1.  negative  hypergeometric  '  orthogonal  regression 

2.  negative  hypergeometric  +  quadratic  regression 

3.  negative  hypergeometric  *•  5-point  moving  weighted  averages 

4.  3-point  moving  weighted  averages  *  5-paint  moving  weighted  averages 

5.  negative  hypergeometric  *•  cubic  splines 

The  final  presmoothing  method  (see  Keats  6c  Lord,  1962;  also  Lord  &  Novick,  1S68, 
pp.  515-520)  is  one  devised  explicitly  for  smoothing  or  fitting  frequency  distribution*  of 
test  scores.  The  di->trioution  is  the  negative  hypergeometric,  whose  appropriateness  is 
derived  from  a  binomial  error  model  of  test  scores.  The  model  assumes  several  technical 
conditions,  one  of  which  i  ..quivaleut  to  the  assumption  that  all  of  the  items  on  the  ted, 
whoso  .score  distribution  is  being  fit  are  equally  difficult.  That  condition  is  known  to  be 
false  in  the  case  of  most  tests,  but  the  fit  of  the  negative  hypergeometric  is  still  good 
enough  to  make  it  promising  for  further  study  (Keats  &  Lord,  1962). 

Objectives 

The  aim  of  the  present  effort  was  to  evaluate  the  effects  of  various  diffei  cut 
methods  of  presmoothing,  postsmootiung,  and  combined  smoothings  on  the  accuracy  of 
test  equating.  The  study  was  exploratory  in  nature,  designed  to  determine  which  method 
hold  the  most,  promise  for  operational  use. 

II.  METilOnS 

General  Plan 

The  plan  underlying  this  investigation  wa.->  tv/  use  three  different  approaches  to 
determine  the  effectiveness  of  each  of  14  unitary  smoothing  methods  and  five  combined 
smoothing  methods.  The  first  approach  used  simulated  tests  and  examinees;  the  second 
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Equatings 

All  test  equatings  were  performed  using  the  equipercentile  nethod  described  by 
Lindsay  and  Prichard  (1974).  For  the  nnsmoothed  equatings  and  the  equatings  to  which 
only  postsmoothing  was  to  be  applied,  the  raw  frequency  files  were  equated.  When  the 
equatings  involved  presmoothing,  the  smoothed  frequency  estimates  were  equated. 
Following  the  equatings  and  smoothings,  each  test  or  simulated  test  had  associated  with  it 
a  criterion  equating,  an  nnsmoothed  equating,  and  19  smoothed  equatings,  one  for  each  of 
t lie  smoothing  methods  used. 

Analysis  of  Equating  Results 

Each  of  the  five  tests,  three  simulated  and  two  operational,  had  associated  with  it 
one  criterion  equating,  1 00  unsmoothed  equatings  based  on  sample  sizes  of  2,000  (called 
the  "small  sa  nple"),  and  100  sets  of  19  smoothed  equatings  based  on  the  same  samples. 
The  question  of  interest  is  the  effect  of  the  smoothings  on  the  accuracy  of  the  equatings. 
\  deviation  is  a  difference  between  an  equated  score  obtained  with  a  small  sample  and  an 
equated  score  based  on  a  criterion  equating.  At  each  observed  (i.e.,  integer)  score  on  the 
experimental  test,  the  corresponding  score  on  the  reference  test  was  found  using  the 
criterion  equating.  The  equated  scores  were  found  as  decimal  fractions  not  rounded  to 
t he  nearest  integer.  The  score  corresponding  to  the  same  experimental  test  score  was 
then  found  for  the  nnsmoothed  snail  sample  equating  and  for  each  of  the  19  smoothed 
equatings.  The  differences  between  the  equated  score  based  on  the  criterion  equating  and 
the  equated  score  based  on  the  small  sample  equatings  were  found  for  each  possible  score 
on  the  experimental  test,  for  the  unsmoothed  and  for  the  smoothed  equatings,  for  all  100 
replications.  These  differences,  or  deviations,  were  the  raw  data  used  for  evaluating  the 
smoothings.  For  each  of  the  100  small  sample  equatings,  the  deviations  at  each  score 
were  combined  across  equatings  to  give  a  general  measure  of  deviation  at  each  score. 
Three  such  deviation  measures  were  computed. 

The  first  measure  is  the  Root  Mean  Square  Deviation  (RMSD),  found  by  taking  the 
square  root  of  the  sum  of  the  squares  of  the  deviations  across  all  100  samples.  The  second 
measure  is  the  Average  Absolute  Deviation  (AAD),  or  simply  the  mean  of  the  absolute 
value  of  the  deviations  computed  across  all  samples.  The  third  measure  is  the  average  of 
the  signed  values  of  the  deviations  (ASD),  found  by  taking  the  mean  of  the  deviations 
across  ail  100  replications.  ASD  differs  from  AAD  in  that  the  absolute  values  are  not 
found  before  the  mean  is  computed.  ASD  is  sometimes  called  "bias,"  or  "statistical  bias," 
but  in  tiie  context  of  testing  the  term  "bius"  denotes  other  phenomena  and  so  is  less 
appropriate  than  "ASD." 


III.  RESULTS  AND  DISCUSSION 

It  was  found  that  with  some  smoothing  methods,  especially  the  presmoothing 
methods,  smoothing  resulted  in  large  increases  in  the  deviation  measures  for  very  low  test 
scores.  In  some  cases  the  increases  were  so  great  that  graphing  them  required  such  a 
large  rescaling  of  the  figures  that  the  more  important  deviations  in  the  middle  ranges  of 
the  test  could  not  be  represented.  These  large  induced  deviations  are  seen  as  being  of 
tittle  interest  because  they  occurred  at  score  values  which  were  lower  than  the  guessing 
level  on  a  test,  and  so  were  not  associated  with  meaningful  measures  of  ability.  In  order 
to  show  the  more  relevant  deviations  effectively,  the  figures  do  not  present  information 
on  the  levels  of  RMSD,  AAD,  or  ASD  at  test  scores  below  the  guessing  level  for  each  test. 

Of  the  14  smoothing  methods,  negative  hypergeometric  smoothing  was  uniformly 
most  effective  in  reducing  root  mean  square  error.  The  results  of  that  method  are  Miown 
in  Table  1. 
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Table  1.  Summary  of  the  Averaged  Effects  of  Presmoothing  by  the  Method  of  Negative 
Hypergeometric 

Proportion  of  Mean  Deviations 


Test  Length 

RMSD 

AAD 

Simulated  Tests 

ASD 

1 5 

.891 

.903 

2.919 

30 

.865 

.867 

3.596 

50 

.852 

.861 

Operational  Tests 

3.453 

20 

.905 

.908 

2.008 

25 

.966 

.989 

7.479 

Mean 

.896 

.906 

3.891 

Note.  Tubled  values  represent  RMSD,  AAD,  and  ASD  as  proportions  of  the  values  found 
without  smoothing.  Averages  taken  over  all  samples  at  all  scores  above  chance  level. 

To  evaluate  the  effects  of  smoothing,  particularly  its  effects  on  deviations,  it  is 
helpful  to  consider  such  deviations  within  the  context  of  the  accuracy  of  ability  or 
achievement  tests  more  generally.  The  standard  errors  of  equating  discussed  in  this 
report  are  not  the  only  measurement  errors  which  arise  in  the  testing  process.  There  are 
also  standard  errors  of  measurement  that  are  intrinsic  to  any  test  which  is  not  perfectly 
reliable.  The  following  formula  (\llen  &  Yen,  1979)  relates  reliability  (R),  standard  error 
of  measurement  (SE),  and  test  score  standard  deviation  (SD). 

SE  =  SO  *  -/r-“R  . 

Thus,  the  standard  error  of  measurement  for  the  experimental  test  of  length  15,  based  on 
i  reliability  (KR-20)  estimate  of  .80  and  a  standard  deviation  of  3.28,  is  1.47.  Similarly, 
the  standard  error  of  measurement  for  the  experimental  test  of  length  30  is  2.20,  and  that 
for  the  experimental  test  of  length  50  is  2.74.  The  corresponding  average  standard  errors 
of  equating,  as  estimated  by  Lord's  formula,  are  .15,  .30,  and  51.  Thus  the  standard  error 
of  equating  ranges  from  approximately  only  10  to  20  percent  of  the  standard  error  of 
measurement. 

The  results  of  smoothing  by  the  negative  hypergeometric  are  the  only  ones  which 
show  consistent  improvement  in  RMSD  and  AAD  as  a  consequence  of  smoothing.  The 
effects  are  particularly  impressive  with  the  simulated  tests,  presumably  in  part  because 
the  criterion  equatings  for  those  tests  are  nearly  perfect,  not  estimated  from  very  large 
so  nples.  The  gains  are  not  uniform  across  the  tests.  On  the  shorter  tests  at  lower  scores, 
the  measures  of  RMSD  and  AAD  actually  increase  as  a  consequence  of  using  the  negative 
hypergeometric.  The  beneficial  effects  of  the  negative  hypergeomtric  do  not  extend  to 
the  measures  of  ASD.  The  ASD  increases  both  globally  and  locally,  sometimes  quite 
dramatically.  These  increases  were  expected  at  the  lower  end  of  the  test,  where  guessing 
is  a  factor,  but  increases  at  the  upper  end  were  not  expected.  It  must  be  noted,  however, 
t  ie  \SD  figures  were  low  initially,  so  that  a  tripling  of  ASD  may  still  denote  an 
acceptably  low  level. 

Why  is  it  that  the  negative  hypergeometric  smoothing  method  outperforms  the  other 
pr ^smoothers?  It  is  likely  that  it  is  in  part  because  that  smoother  takes  into  account  all 
of  the  information  in  a  distribution's  mean  and  standard  deviation  in  arriving  at  the 
smoothed  frequency  for  each  point.  Although  the  negative  hypergeometric  does  require 
the  assumption  that  all  items  are  equally  difficult,  an  assumption  usually  contradicted  in 
practice,  its  success  as  a  presmoother  indicates  that  its  use  is  robust  against  violation  of 
tins  assumption.  Furthermore,  among  the  seven  presmoothers  investigated,  only  the 
negative  hypergeometric  is  based  on  a  mathematical  model  of  testing. 
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The  present  study  is  limited  in  several  respects,  all  of  which  may  tend  to  reduce  its 
generalizability  to  other  applications. 

First,  only  five  tests  were  used:  two  operational  and  three  simulated. 
Genenlizations  to  other  tests  may  be  inadvisable  if  the  tests  do  not  statistically  resemble 
those  used  for  this  study. 

Second,  the  tests  used,  especially  the  simulated  tests,  may  be  more  similar  to  each 
other  than  are  most  operationally  equated  tests.  Generalization  to  less  similar  tests  is  of 
questionable  appropriateness. 

Vnong  the  presmoothing  methods,  the  negative  hypergeometric  and,  by  extension, 
other  smoothers  of  the  same  beta  binomial  family,  deserve  consideration  for  operational 
use.  If  any  of  the  presmoothers  studied  here  is  to  be  adopted,  then  the  negative 
hypergeometric  would  be  the  most  appropriate.  It  has  the  effect  of  reducing  RMSD  by 
about  ten  percent,  a  benefit  which  could  also  be  achieved  by  increasing  sample  size  by 
about  20  percent. 
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WEIGHTED  LIKELIHOOD  ESTIMATION  0^  ABILITY  IN  ITEM  RESPONSE 
THEORY  WITH  TESTS  OF  FINITE  LENGTH. 


Thomas  A.  Warm,  Ph.D. 

U.S.  Coast  Guard  Institute 

A  new  method  of  statistical  estimation.  Weighted  Likelihood 
Estimation  (WLE),  was  discovered  in  Warm  (1985a),  and  a  new 
theorem  of  mathematical  statistics  was  proved  in  Warm  (1985b). 

The  theorem  states  that  WLE  has  zero  first  order  bias,  in 
contrast  to  Maximum  Likelihood  Estimation  (MLE)  and  Bayesian 
Modal  Estimation  (BME)  which  are  both  biased.  WLE  was  applied  to 
ability  (8)  estimation  in  Item  Response  Theory  (IRT). 

METHOD 

Using  Monte  Carlo  methods,  WLE(0)  was  compared  to  MLE(9)  and 
BME(8)  on  12  conventional  teste  with  10  to  60  items,  a-parametera 
of  1  or  2,  and  normally  distributed  b-parameters .  The  three 
estimators  were  also  compared  on  two  tailored  tests  with  optimal 
b-parameters.  One  tailored  test  had  an  infinite  item  bank  and 
all  a  =  2.00  .  The  other  tailored  test  simulated  a  finite  item 
bank  with  declining  a-parameters .  For  all  tests  all  c  =  0.20  . 

RESULTS 

Partial  results  are  presented  m  the  figures  below.  For 
complete  results,  see  Warm  (1985b). 

In  both  conventional  and  tailored  tests  WLE ( 8 )  was  less 
biased  than  both  MLE(8)  and  BME(8).  In  addition  WLE(6)  had  small 
variance  over  the  entire  range  of  the  8-scale,  as  well  as  small 
mean  squared  error  even  at  non  central  6. 

DISCUSSION 

The  relative  unbiasedness  of  WLE<.8>  makes  this  estimator 
particularly  appropriate  in  applications  of  IRT  for  which  the 
parameter  invariance  property  is  important. 

Two  new  insights  for  MLE(0)  were  discovered:  1)  natural, 
rational  bounds,  and  2)  a  conditional  analogy  to  the  attenuation 
paradox  m  tailored  tests  with  high  a-parameters. 

The  heart  of  WLE(9)  is  a  weighting  function,  w(8),  which  is 
multiplied  times  the  likelihood  function,  and  the  product 
maximized.  This  weighting  function,  which  removes  the  bias  and 
uncontrolled  variance  of  MLE(9>,  is  a  function  of  8  and  the  item 
parameters,  and  is  specific  to  each  test.  It  was  shown  to  be 
equal  to  the  square  root  of  test  information  for  the  one-  and 
two- parameter  models  of  IRT,  and  equal  to  a  closely  related 
function  for  the  thr ee  -  par amet er  model. 
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LEADER  BEHAVIOR  AND  THE  PERFORMANCE  OF  FIRST  TERM  SOLDIERS 


Leonard  A.  White,  Ilene  F.  Gast  and  Michael  G.  Rumsey1 
U.S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 


A  large  Amy  project  is  underway  to  validate  new  and  current  predic¬ 
tors  of  first  :erm  soldier  performance-  A  major  objective  of  this  effort 
is  to  increase  Army  organizational  effectiveness  by  improving  the  soldier 
job  match.  This  will  be  accomplished  by  developing  a  set  of  selection  and 
classification  measures  (predictors)  and  performance  criteria  and  then 
empirically  demonstrating  relationships  between  the  predictors  and  per- 


It  is  recognized  that  job  performance  is  not  only  related  to  charac¬ 
teristics  which  are  measurable  and  identifiable  prior  to  enlistment,  but 
is  affected  by  experiences  and  developmental  opportunities  that  occur 
throughout  a  soldier's  life-cycle  in  the  Army.  The  focus  of  the  present 
research  is  on  the  performance-relevant  consequences  of  a  soldier's  inter¬ 
action  with  his  or  her  superiors.  Longitudinal  research  indicates  that 
the  quality  of  leader-subordinate  work  relationships  are  predictive  of 
job  success  (Wakabayashi  &  Graen,  1984).  Aspects  of  leader  behavior  such 
as  providing  rewards  and  recognition,  disciplinary  practices,  and  inspira¬ 
tional  leadership  have  been  related  to  subordinate  effort  and  performance 
(e.g.,  Yukl ,  1981 ). 

However,  past  research  on  leadership  and  performance  has  generally 
omitted  the  influence  of  ability  or  the  potential  interactive  effect  be¬ 
tween  individual  aptitudes  and  leadership  on  job  proficiency  and  perform¬ 
ance.  Some  investigations  (e.g.,  Barnes,  Potter,  &  Fiedler,  1983)  have 
suggested  that  the  prediction  of  job  performance  from  general  ability  i3 
moderated  by  leadership.  Other  researchers  (Schmidt  &  Hunter,  1977)  have 
argued  that  the  relationship  between  general  ability  and  performance  is 
stable  across  time  and  situations  for  similar  jobs. 

To  summarize,  the  model  examined  in  this  research  assumes  that  job 
performance  is  influenced  by  a  new  incumbent's  capabilities  measured  prior 
to  enlistment  and  characteristics  of  the  work  environment.  Within  this 
framework  the  purpose  of  this  research  was  twofold:  (a)  to  oxamine  rela¬ 
tionships  among  dimensions  of  leader  behavior  and  subordinate  performance, 
and  (b)  to  explore  possible  moderating  effects  of  leadership  on  the  corre¬ 
lation  Detween  general  cognitive  ability  and  job  performance. 

METHOD 

Research  participants  were  696  first  term  soldiers  in  five  military 
occupational  specialties  (MOS);  156  infantrymen  (MOS  1 1 B ) ,  139  armor 
crewmen  (MOS  19E),  125  radio  teletype  operators  (MOS  51 C),  141  light  wheel 
vehicle  mechanics  (MOS  63B) ,  and  135  medical  care  specialists  (MOS  91A). 


^he  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  views  of  the  U.S.  Army  Research  Institute  or  the 
Department  of  the  Army. 
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Of  these  soldiers,  88.5$  were  male  and  1i.5$  were  female;  28$  were  black, 
3$  were  hispanic,  64$  were  white,  and  5$  other.  Soldiers'  report  of  work 
experience  in  their  unit  ranged  from  2  months  to  49  months  (median=one 
year) . 

Instruments 


The  first  step  in  this  research  was  to  develop  measures  of  leader 
behavior  and  soldier  performance  on  the  job. 

Supervisor  behavior  rating  scales.  Critical  incidents  workshops  were 
conducted  with  80  NCO  in  the  five  target  MOS.  These  NCC  generated  a  total 
of  474  examples  of  leader  behaviors  thought  to  influence  soldier  perform¬ 
ance.  Classification  of  the  incidents  by  two  of  the  authors  and  31  NCO 
familiar  with  Army  leadership  requirements  led  to  the  identification  of  9 
categories  of  leader  behavior  (White,  Gast,  Sperling,  &  Rumsey,  1984).  At 
least  5  and  no  more  than  8  items  were  written  to  represent  important 
leader  behaviors  in  each  category  (e.g.,  Your  supervisors  are  hard  to  find 
when  you  need  them).  These  procedures  resulted  in  a  60-item  question 
naire.  Responses  to  each  item  were  made  on  a  5-point  scale  from  very  seldom 
or  never  (l )  to  very  often  or  always  (5 ) • 

Job  performance  rating  scales.  To  develop  these  scales,  critical 
incident  workshops  were  conducted  in  which  NCO  provided  examples  of  effec¬ 
tive  (as  well  as  ineffective)  soldier  performance.  The  number  of  NCO  and 
examples  provided  were  as  follows:  MOS  11B,  51  NCO's,  and  906  incidents; 

MOS  19E,  43  NCO's  and  798  examples;  MOS  31 C,  45  NCO's  and  830  incidents; 

MOS  63B,  49  NCO's  and  882  incidents  and;  MOS  91  A,  42  NCO's  and  783  inci¬ 

dents.  A  variant  of  the  behaviorally  anchored  rating  procedure  (Smith  & 
Kendall,  1963)  was  used  to  develop  behavior-based  rating  scales  for  each 
job.  The  resulting  rating  form  for  each  job  consisted  of  seven  to  ten 
7-point  behavior  summary  scales. 

Army-wide  performance  rating  scales.  To  prepare  these  scales,  77 
NCO's  and  junior  officers  working  in  a  wide  variety  of  Army  jobs  generated 
1,215  behavioral  examples.  The  examples  represent  those  aspects  of  sol¬ 
dier  effectiveness  that  contribute,  broadly  speaking,  to  organizational 
effectiveness,  such  as  following  orders  and  regulations.  The  target  cri¬ 
terion  space  for  these  scales  went  beyond  job  performance  to  include  as¬ 
pects  of  socialization  and  commitment  to  the  organization.  Eleven  7-point 
...it.  .  ..'■tv  '  /  ""  '  r»d  for 

Kands-on,  task  proficiency  tests.  For  each  of  the  jobs,  5-8  critical 
tasks  were  identified  to  represent  the  MOS-specific  task  domain.  Multi- 
step  task  proficiency  tests  were  prepared  for  each  task.  Each  step  of  a 
task  was  scored  pass  or  fail.  A  score  for  each  task  was  computed  by 
calculating  the  proportion  of  steps  passed  and  the  task  scores  were  aver¬ 
aged  to  yield  ar.  overall  hands-on  test  score. 

Job  knowledge  tests.  Through  job  analysis,  important  knowledge  areas 
were  identified  for  each  of  the  five  jobs.  With  the  help  of  subject  mat¬ 
ter  experts,  items  were  written  to  tap  these  knowledges.  For  each  sol¬ 
dier,  the  percentage  of  correct  items  was  the  overall  job  knowledge  test 
score . 

148 


Correlations  of  hands-on  and  jo'o  knowledge  test  scores,  job  perform¬ 
ance  ratings,  and  the  Army-wide  effectiveness  rating  with  the  leader  be¬ 
havior  scales  are  presented  in  Table  2.  Results  are  shown  separately  for 
each  of  the  five  jobs.  A  mean  correlation  (r)  across  the  live  jobs  was 
computed  by  weighting  each  correlation  by  its  associated  sample  size 
(Hunter,  Schmidt,  &  Jackson,  1982).  The  highest  correlations  were 


Table  2 

Corralatlona  between  Leadership  Scales  and  Criterion  Measures  by  Amy  Job 


Leadership  Scale 

Job 

1 

2 

3 

>■ 

o\ 

7 

8 

Total 

Scale 

Hands- 

■on  Task  Proficiency 

11B 

.05 

.03 

.19* 

a 

.24 

• 

.17. 

a 

.25 

.10 

.18* 

19E 

.14 

.11 

-01. 

•14, 

.24 

.11 

.18 

.23 

.22 

31 C 

.02 

.00 

.15 

.18 

.02 

.00 

.03 

.15 

.09 

63B 

.12 

-.05 

.10 

.06 

.00 

.05 

.06 

.12 

.09 

£1* 

-.05 

-.14 

-.12 

.02. 

-.04 

-.06. 

-.09 

-.06. 

-.10. 

r 

.05 

-.02 

.06 

.13 

.07 

.09 

.05 

.10 

.09 

11B 

-.01 

-.17* 

.02 

.13 

Job  Knowledge 

.03  .15* 

.09 

.12 

.03 

19E 

-.03 

-.03 

.05 

.03. 

.06. 

-13. 

-.02 

-.03 

-02, 

31 C 

.17 

.11 

.12 

.30 

.26. 

.23 

.12 

.17 

.22 

6JB 

-.05 

-.04 

.05 

.05 

.20 

-05. 

.01, 

-.06. 

-.01 

91A 

-.12 

-.10 

-.01 

-01. 

-.11 

-.18 

-.22 

-23 

-.13 

r 

-.01 

-.06 

.04 

.09 

.06 

.00 

.00 

-.01 

.01 

Job 

Performance 

Rating 

• 

a 

IIP 

.12 

.01 

•>>5 

.23 

.06 

.21 

.10 

•03, 

.13 

19E 

.04 

.09 

.16 

.06, 

.05 

.11 

.21 

.13 

31 C 

.21 

.01 

-.04. 

.12 

.20, 

.17 

.02 

.07 

.14, 

63B 

.17 

.06 

.23 

.08 

.20 

.07 

.07 

.06 

.18 

91 A 

•06. 

.07 

.05 

.00, 

.08,  - 

•.12 

.01 

.03, 

r 

.13 

.04 

.06 

.14 

.11 

.11 

.04 

.06 

.12 

/  i  «  . 

:  t* :  .  a 

rt.t  - 

• 

• 

• 

* 

1 1 B 

.17 

.06 

.04 

•  23 

.12, 

.20 

.11 

.07 

•’7. 

19E 

•12e 

.02, 

.07 

•’4. 

•22a 

.10, 

.14, 

.15, 

.15, 

31 C 

•4,a 

.19 

•12a 

•  34 

•32. 

.32 

.18 

.24 

•?7. 

63B 

.20, 

.13 

•  30 

*11, 

.19 

.06 

.04 

.06 

.21 

91 A 

•'7. 

.05, 

•’4. 

.20, 

.07, 

.09, 

-.09 

.07, 

•’2. 

r 

.21 

.09 

.13 

.20 

.18 

.15 

.07 

.12 

.20 

Note.  Leadership  scales:  1  (Support);  2  (informing);  3  (Fairness);  4  (Participation); 
5  (Perioraance  Contingencies);  6  (Role  Clarification);  7  'Results  Orientation); 

Q  (Training  4  Developaent) ;  9  (Total). 
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General  cognitive  ability.  The  Armed  Services  Vocational  Aptitude 
Battery  (ASVAB)  was  administered  to  all  participating  soldiers  prior  to 
entering  military  service.  The  ASVAB,  which  consists  of  ten  subtests,  is 
used  for  selection  and  occupational  classification.  A  composite  measure 
of  four  ASVAB  subtests,  known  as  the  Armed  Forces  Qualification  Test 
(AFQT),  was  used  as  the  measure  of  general  cognitive  ability. 

Procedure 


Raters  were  trained  to  use  the  behavior- based  rating  scales.  After 
training,  supervisors  in  groups  of  3-15  evaluated  their  subordinates  on 
the  Army-wide  and  job  performance  rating  scales.  The  mean  number  of 
supervisor  raters/ratee  ranged  from  1.66-1.83  for  the  five  MOS.  Ratings 
were  averaged  across  supervisor  raters  to  form  an  overall  job  performance 
rating  and  an  Army -wide  effectiveness  rating  for  each  ratee. 

The  first  term  soldier  (ratees)  completed  the  supervisor  behavior 
rating  scales,  and  were  also  administered  tests  of  job  knowledge  and 
hands-on,  task  proficiencies. 


RESULTS 

Principal  components  factor  analysis  was  used  to  examine  the 
dimensionality  of  the  supervisor  behavior  rating  scales.  Varimax  and 
promax  solutions  were  computed  and  the  interpretation  restricted  to  fac¬ 
tors  appearing  in  both  solutions.  Comparison  of  the  rotated  structures 
yielded  eight  factors  with  eigenvalues  greater  than  one.  Items  loading 
above  .4  on  one  and  only  one  factor  were  interpreted  as  measuring  the 
factor.  Items  with  weak  loadings  on  all  factors  or  similar  loadings  on 
two  or  more  factors  were  not  used  to  measure  any  factor.  Factor  score 
estimates  were  computed  by  unit  weighting  and  summing  individual's  re¬ 
sponses  tc  the  set  of  items  representing  each  factor.  Table  1  presents 
the  intercorrelations  among  the  estimated  factor  scores. 
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Table  1 

Intercorrclatlone  Aaong  Leaderahlp  Scalee  and  Surer?  Statistics. 


\ .  Support/lnepiration 
2.  Informing 

5.  Peirneae 

4.  Participation 

5.  Performance  Contlngencl ea 

6.  Role  Clarification 

7.  Beaulta  Orientation 

8.  Training  and  Developaunt 

9.  Total 


4 

9 

7 

e 

9 

lo.  of 

I  '.»• 

Scale 

Hear 

Std. 

*  *  if 

.♦8 

.64 

.55 

.72 

.58 

.68 

.90 

9 

27.2 

7.5 

•  54 

.49 

.55 

.54 

.46 

•  50 

.79 

6 

19-0 

4.8 

.74 

.47 

.46 

.40 

.51 

.56 

.67 

5 

16.5 

4.5 

.70 

.44 

.60 

.46 

■  56 

.76 

4 

15-4 

5-4 

.55 

.47 

.40 

•  59 

.67 

5 

9-9 

2.5 

.78 

•  55 

•  65 

.80 

4 

12.9 

5-1 

.56 

•  59 

.66 

5 

9-4 

2.2 

.72 

.77 

5 

14.7 

5-9 

•  94 

59 

125.1 

24.7 

Rote.  Internal  coneietency  reliabilities  are  preeented  on  the  diagonal. 

*1  -  6 i96 
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obtained  between  perceptions  of  leader  behavior  and  the  Army-wide  effec¬ 
tiveness  ratings.  Within  the  set  of  Army-wide  performance  dimensions, 
strongest  relationships  were  obtained  between  supportive  and  participative 
leadership  and  ratings  of  subordinate  adherence  to  regulations  and  will¬ 
ingness  to  provide  extra  effort  when  needed.  Statistically  significant 
but  low  correlations  between  leader  behaviors  and  job  proficiency  were 
evident  in  the  two  combat  MOS. 

Hierarchial  regression  analysis  was  used  to  estimate  the  relationships 
of  cognitive  ability  (i.e.  AFQT  score),  leadership  climate,  and  their 
interaction  to  job  proficiency  and  performance.  The  AFQT  score  was  en¬ 
tered  first  in  the  regression  to  assess  the  contribution  of  mental  ability 
at  the  time  of  enlistment  to  later  job  performance.  Then,  leadership  and 
the  ability  X  leadership  interaction  were  entered  to  assess  post-enlist¬ 
ment  leader  influences  on  performance  and  the  utilization  of  ability  on 
the  job.  In  the  regression  mocel,  leadership  was  represented  by  the  sum 
of  scores  on  the  8  leadership  scales.  The  criterion  variables  were  job 
knowledge,  hands  on  task  proficiency,  and  supervisor  ratings  of  job  per¬ 
formance  and  A~my-wide  effectiveness. 

Of  interest  heie,  results  of  the  regression  analyses  revealed  no 
statistically  significant  increase  in  Rf  due  to  inclusion  of  the  ability  X 
leadership  interaction  in  the  model.  In  each  of  the  five  jobs,  the  high¬ 
est  multiple  correlations  were  obtained  for  prediction  of  job  knowledge, 
with  _R  =  .30,  to  .60,  all  £<.05.  This  effect  was  primarly  attributable  to 
the  influence  of  general  ability  on  job  knowledge.  Leadership  and  cogni¬ 
tive  ability  had  significant  independent  effects  on  task  proficiency  in 
the  infantryman  and  armor  crewman  jobs  with,  respectively,  R  =  .28,  p<.05, 
and  R=.37,  p<.05.  However,  in  MOS  9lA  anu  MOS  63B  ]R  s  for  the  prediction 
of  task  proficiency  from  the  independent  variables  failed  to  reach  sig¬ 
nificance.  With  respect  to  supervisory  ratings  of  job  performance,  abil¬ 
ity  and  leadership  and  their  interaction  accounted  for  less  than  5$  of  the 
variance  in  this  criterion.  Leadership  showed  several  significant  corre¬ 
lations  with  Army-wide  effectiveness  ratings  at  the  zero-order  level, 
however  the  for  this  criterion  achieved  significance  only  in  the  ra¬ 
dio-teletype  operator  job,  with  R_=.37,  £<.05.  Correlations  between  cogni¬ 
tive  ability  and  the  Army-wide  effectiveness  rating  ranged  from  _r=  -.28  to 
.03. 

DISCUSSION 

*  Il'v.  t  C  O  d  I  1  V  1  t/uLtal  V/lt  LAjjO.  ^IbV/VA  1  Old  ta.OllSlilpb  k/C 

nitive  ability,  and  the  performance  of  first  term  enlisted  soldiers.  Re¬ 
sults  for  the  five  Army  jobs  examined  ucre  support  the  conclusion  that 
general  ability  and  leadership  behavior  have  independent  effects  on  per¬ 
formance.  However,  each  appears  to  contribute  to  effective  soldiering  in 
different  ways.  Leadership,  as  perceived  by  the  subordinate,  had  the 
strongest  effect  on  the  motivation- related ,  dependability  facets  of  per¬ 
formance  measured  by  the  behaviorally  based  rating  scales.  General  cogni¬ 
tive  ability  contributed  to  performance  by  enabling  enlistees  to  learn  the 
facts  and  procedures  required  to  perform  their  jobs. 

No  evidence  was  obtained  indicating  that  relationships  between 
general  ability  and  job  proficiency  and  performance  are  moderated  by 
leadership  influences.  This  finding  supports  conclusions  by  Schmidt  and 
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r.ur.t-.r  jy,  ;  that  tr.e  vs. laities  oi  ■'  t i  ve  tests  are  3u.: 
situations  r  the  same  job.  Correlations  ietweer.  general  co 
ity  6t.d  each  criterion  cea.su re  did  vary  somewhat  a-.ross  jcbs, 
ail  of  the  variation  was  attributable  tc  sampling  error. 

The  relationships  between  *eadex\  ip  and  p -rformar.ee  rep. 
should  not  be  interpreted  as  indicating  tr.at  ^eadersnip  benav 
performance.  Leadership  effects  on  performance  may  be  unders 
of  exchange  theory  (Graen,  1  -j'io ,  which  views  the  interaction 
leader  and  subordinate  as  a  reciprocal  influence  process  that 
over  time.  Subordinates  who  are  perceive i  as  willing  to  worn 
support  the  mission  will  be  evaluates  more  favorably  by  their 
In  return  for  their  support,  these  soldiers  are  likely  to  rec 
individualized  attention,  information,  ar.d  other  resources  fr 
supervisors;  which,  in  turn,  serves  tc  reinforce  and  sustain 
effort . 

The  resuits  reported  here  are  largely  exploratory.  Future  data 
lection  and  analysis  will  provide  an  opportunity  to  confirm  the  leade 
factors  ani  to  examine  potential  moderating  effects  of  leadership  beha 
on  a  broad  ran0e  of  soldier  aptitudes  and  characteristics. 

REFERENCES 

Barnes,  V.  ,  Potter,  V. .  H.  ,  &  Fiedler.  F.  E.  Effect  o' 

sonal  stress  on  prediction  of  academic  performance.  J ou 
Applied  Psychology,  68,  6b6-t}7. 

Graen,  G.  (1976;.  Role-making  processes  within  complex  organ 
M.  Dunne tte  ( ed . )  Handbook  of  Industrial  Organizational 
Chi  ergo;  Rar.d  McNallv. 


White,  L.  A.,  Cast,  l.  r. ,  Eperli  g,  li.  M.  ,  S  Kumsf-y ,  M.  G .  0  9« 
Influence  of  soldiers’  ex  or  cnees  with,  supervisor^  on  perf 
during  the  fir~*  tour.  lap  r  presented  at  meeting  of  the  M 
T.  Uing  n'.'vc.'' ,  a  t .  on ,  Munim,  Germany. 


Yuki,  'J  A.  i.radi-rsi  i  p  u,  c  rgani  za  1 1  ona.  Englewood  Cli 

1  rent  i  --e-Hal  1 . 


MtASlRISo  MILITARY  ritRDISM 


Jeffrey  W.  Anderson 
U.S.  Army  Researcn  Institute 

U  a  »ar.e  estent  the  drama  of  human  history  has  leen  woven  trom  the 
.1  ,  ra,  r.les  of  people  who  by  the  nature  of  their  successes  beyond  the  pattern 
of  e.er.da.  life  assumed  a  mythical  or  legencary  stature  and  became  national 
heroes  Dawn  ton,  1173).  During  periods  of  revolutions  and  wars  the  fate  of 
entire  ,eoples  1-as  seened  to  hang  upon  the  decisions  ot  a  few  powertui  peo- 
t'.e.  .et,  Arthur  N .  Schlesinger,  diaries  de  Jaulle,  and  others  iiave  re  r.arr  ed 
t,.it  tne  age  of  heroes  is  past  '.Jennings,  19o  'j ) .  Yet  it  is  also  possible  tr*»t 

t.  e  *‘_»racter  ot  our  heroes,  or  what  we  recognise  as  heroism,  lias  changed.  We 
no  I  ,n.’er  regard  our  heroes  as  rare  specimens  of  humanity.  They  are  only 

s.  e-rat  different  from  ourselves  generally  due  to  the  situation  or  circum- 
sti..:es  surrounding  the  act  of  heroism.  While  gallant  acts  have  given  birtn 
to  legends  of  heroes,  scientific  research  in  the  area  of  heroism  has  been 

u. n  scarce.  Ihe  present  study  is  a  preliminary  attempt  to  examine  four 
reoeircn  questions.  rirst,  wha t  are  tne  ingredients  of  heroism?  Second,  do 
people  still  recognize  and  label  certain  behaviors  as  heroic?  Third,  have 
those  heroic  characteristics  changed  over  time?  And  fourth,  can  we  measure 
nereis"  before  a  crisis? 

era  ism  involves  great  bravery,  daring  ecu  ra^e ,  valor,  gallantry,  and 
intrepidity  t  Barnhart,  19o7>.  It  includes  elements  of  extreme  seli-sacritic- 
inb  courage,  fulfilling  a  hign  purpose,  and  attaining  a  noble  end  (Webster, 
I.*?*..  But  these  qualities  do  not  readily  lend  themselves  to  measurement  and 
research.  In  the  literature  on  heroism  there  are  three  major  foci.  The 
:!rst  e-phasizes  the  personality  of  the  hero.  It  contends  that  the  hero 
~oui  ;  have  been  socially  recognized  as  a  hero  without  regard  for  the  circum¬ 
stances  surrounding  his  behavior.  The  second  focus  follows  the  arguments  ui 
negel  and  Spencer.  Man  is  considered  a  mere  product  of  social  forces  coin- 
cl  totally  combined  in  time  and  space  to  converge  upon  an  individual  tna t 
society  labels  as  a  hero.  Finally,  early  social  reformers  and  revolutionar¬ 
ies,  emphasizing  Darwinian  concepts,  contended  that  heroes  were  thrown  up  by 
sase  c.iar.ce  of  the  natural  selection  process.  The  social  environment  was  a 
selection  instrument  to  provide  opportunities  for  these  men  to  display  their 
r.ere  Jitary  talents  in  ..eroic  acts.  The  situation  limited  the  hero,  but  au 
net  dominate  nim. 

In  general,  the  literature  of  heroism  tells  us  little  about  the  type  oi 
person  who  is  predisposed,  with  situation  permitting,  to  become  a  military 
hero.  In  other  words,  while  we  all  seem  to  know  about  heroism  there  is  no 
operational  definition  of  the  concept.  To  attempt  to  provide  an  operational 
definition,  we  systematically  analyzed  tne  citations  for  heroism  in  which  the 
recipient  was  given  the  highest  military  award,  the  Medal  of  Honor.  These 
citations  should  also  represent  our  society's  general  definition  of  heroic 
action.  The  Medal  or  Honor  is  presented  to  a  service  member  who  "distin¬ 
guished  nimself  conspicuously  by  gallantry  and  intrepidity  at  the  risk  of  his 
life  above  and  beyond  the  call  of  duty."  As  the  highest  military  award  for 
bravery,  each  branch  of  the  armed  forces  lias  established  a  set  of  prescrip- 


Ihe  views  expressed  in  this  paper  are  those  of  the  authoi  and  do  not  neces¬ 
sarily  reflect  the  view  of  the  US  Army  Research  Institute  or  the  Department 
ol  the  Army. 
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Live  regulations  that  leave  no  margin  of  error  or  doubt.  The  deed  of  the 
recipient  must  be  proved  by  incontestable  evidence  from  at  least  two  eyewi 
nesses,  and  it  must  be  so  outstanding  that  it  clearly  distinguishes  his  ga 
lantry  beyond  the  call  of  duty  from  lesser  forms  of  bravery.  As  such,  tht. 
citations  from  Medal  of  honor  winners  are  essentially  narratives  of  critic 
Incidents  of  heroism  and  well-suited  to  the  development  of  an  operational 
definition  of  military  heroism  (llanagan,  1954). 

Me  thod 

At  the  time  of  this  study  the  Medal  of  honor  had  been  awarded  3369  ti 
spanning  the  years  ioo3  to  1978.  From  the  citations  given  with  these  a  war 
o37  were  randomly  selected  for  analysis. 

A  group  of  three  judges,  two  male  and  one  female,  working  independent.1 
analysed  these  citations  to  determine  the  critical  behaviors  exhibited  by 
hero  that  were  recognized  by  his  fellow  s  Mers  and  rewarded  by  the  mili¬ 
tary.  The  eight  dimensions  of  behavior  that  were  unanimously  listed  by  ai 
three  judges  were  accepted  as  truly  describing  military  heroism. 

Another  group  of  five  judges,  two  female  and  three  male,  were  tnen  asc 
to  rate  each  of  these  dimensions  concerning  how  well  it  described  militar 
heroism  as  exemplified  by  a  group  of  254  randomly  selected  different  cita 
tions.  These  additional  citations  were  chosen  in  an  effort  to  broaden  th 
sample  and  thereby  improve  the  general  izabili Ly  of  the  dimensions  and  fin 
-ugs.  In  order  to  achieve  these  objectives  ten  percent  of  the  remaining  - 
citations  or  303  citations  were  randomly  selected.  Since  many  of  these  ci' 
tions  were  verbatim  duplicates  of  others,  redundant  citations  were  elkrai- 
nated,  giving  254  citations  for  further  study.  These  judges  had  extensive 
military  experience  and  represented  the  Army,  Navy,  and  the  Air  Force.  F 
judge  was  asked  to  rate  on  a  1  to  5  scale  the  extent  to  which  the  derived 
eight  dimensions  described  the  behavior  of  the  award  recipient  in  each  ci 
tion  ifrom  not  at  all  descriptive  to  totally  descriptive). 

Since  the  judges  used  the  rating  scales  differently,  each  rating  scoi 
was  converted  to  a  standard  score  using  the  overall  mean  and  variance  of  • 
judge  giving  the  rating.  The  interrater  reliabilities,  using  Winer's  tei 
nique  and  corrected  for  five  judges,  were  calculated.  Based  on  these  re  I 
bilities,  the  ratings  from  all  five  of  the  judges  were  combined  and  a  me.' 
rating  was  calculated  tot  each  performance  dimension.  The  more  accurate! 
dimension  described  the  concept  of  heroism  as  defined  by  a  specific  cita 
the  higher  the  judge's  rating  on  that  dimension.  The  dimensions  were  ra. • 
in  order  of  their  overall  importance  in  describing  military  heroism. 

Subsequently,  ratings  were  categorized  according  to  the  conflict  fro* 
which  the  Medal  of  Honor  citation  was  drawn.  A  comparison  between  confl' 
ot  relative  importance  of  each  dimension  to  the  concept  of  Military  Hero, 
was  used  to  test  the  implied  hypothesis  that  our  societal  definition  of 
ism  had  changed  over  Lime. 

Results 

The  tirst  group  of  three  judges  were  asked  to  independently  derive 
sicns  of  military  heroism  based  on  a  randomly  selected  sample  of  337  ci»  ’ 
from  the  Medal  of  Honor.  Dimensions  that  were  unanimously  selected  by  <  v« 
judges  were  used  to  describe  each  citation.  Ail  citations  could  be  de.-i  ■ 
using  one  or  more  of  the  dimensions  presented  in  Table  1. 


Id v  -e 


idi.ce  with  Interrater  Reliabilities 


1  irec  s tins  ot  hetwi»s.  in  order  it  hrvr 


l»ice:.s  :«i.  r 

T  e  l.eTv  Is  if  Cl  ^lil)  Je*.wleJ  tw  a*,  l  w>op  1  1  sn  i.Is  dul/.  (Devotion  to  Duty)  .*7 

7  be  r.eiw  >ets  a  personal  e^.  pie  c!  beftativr  t  or  others.  (Personal  Lxaaple)  .43 

Ir.e  hero  ri>*s  his  own  lire  or  places  hl&>e.i  ii.  danger.  (Accepting  hanger/  .76 

T'e  .erv  reb'.oes  or  saves  an.  trier  person.  paving  Lite/  .92 

Tie  be  Tv  vver*.  oc.es  bis  owr*  injuries  vr  lilr.es  s.  (Over  coici  ng  Injury )  .93 

The  *eiv  bo*.>:ee3s  when  the  ^das  are  overwhelmingly  against  hio.  tLefeatir.g  Great  Odds)  .71 

ihe  herv  tofses  wccidiid  or  gives  leadership  when  It  is  lacKing.  (Taking  CotaaancJ)  .73 

The  hetw  seizes  upon  an  opportunity.  (Seizing  an  Opportunity)  .37 


Essentially,  the  military  hero  nad  to  set  an  example  of  behavior  before 
and  alter  formal  recognition  oy  the  organization.  He  persisted  in  the  accom¬ 
plishment  of  his  duty  and  willingly  accepted  personal  danger,  subordinating  his 
own  life  to  the  values  of  his  cause.  Given  the  opportunity  to  command  (leader¬ 
ship)  he  did  so,  and  when  opportunity  presented  itself  he  seized  upon  it. 

These  dimensions  did  not,  however,  demorstrate  any  relative  importance  in 
describing  heroism.  A  second  set  of  judges,  therefore,  was  asked  to  rate  on  a 
scale  of  1  to  5  how  important  each  dimension  was  in  describing  the  heroism 
expressed  by  each  of  254  different,  randomly  selected  citations.  After  con¬ 
verting  the  ratings  given  by  each  judge  to  standard  scores,  the  interrater 
reliabilities  for  each  dimension,  using  Winer's  technique  and  corrected  for 
five  judges,  were  calculated.  These  reliabilities  are  also  presented  in 
Table  1.  While  the  first  two  dimensions  show  lower  reliability  than  other 
dimensions,  all  are  considered  acceptable  for  untrained  raters,  especially  in 
light  of  the  unanimous  agreement  given  to  these  two  dimensions  by  the  origi¬ 
nal  three  judges. 

based  on  these  reliabilities,  the  judges'  evaluations  were  combined.  The 
results  of  this  combination  yielded  the  order  of  importance  for  the  dimen¬ 
sions  as  shown  in  Table  1.  In  all  cases,  the  five  judges  agreed  that  the 
dimensions  previously  constructed  to  describe  heroism  were  adequate  descrip¬ 
tors  of  heroism. 

To  test  the  implied  hypothesis  tha t  military  heroism  has  changed  over 
time,  each  of  the  judges'  ratings  were  grouped  according  to  the  time  period 
of  the  citation  from  which  the  rating  was  derived  and  average  ratings  for 
each  dimension  were  calculated  for  seven  major  conflict  periods  from  the 
Civil  War  through  the  Vietnam  conflict.  A  profile  analysis  of  these  average 
ratings  by  conflict  period  is  shown  in  Figure  1. 
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.*  \i  i  ne  wa.rior  Spirit  encom- 
rO  , .  >  i  ;  l-.*s  >.  s  -,eu  tial  to  success- 
'••'e  >1  mi  1  i  i  .i •.  history,  in  a 


1)  A  selfless  devotion  to  accomplishing  a  duty  or  perceived  noble 
cause . 

2)  Leadership  by  personal  example--especially  applying  high  but 
achievable  standards  to  himself  and  his  unit, 

i )  A  reasoned  acceptance  of  risk  (especially  risk  of  his  own  life)  — 
calm,  confident  and  self-controlled  in  the  face  of  mortal  dan¬ 
ger. 

* ,)  Decisiveness  despite  unreliable,  incomplete  and  often  inaccurate 
information  (ability  to  separate  the  Important  from  the  trivial). 

3/  Being  effective  at  communicating  instructions  so  that  every  mem¬ 
ber  of  the  unit  knows  and  understands  the  leader's  wishes. 

o'  creating  a  team  or  cohesive  unit  that  all  work  as  one  to  achieve 
the  noble  cause  or  purpose  and  tra lnlng  tha  t  unit  for  combat, 
fhe  Warrior  Spirit, then,  appears  to  be  a  combination  of  native  character¬ 
istics  and  training.  The  Fighter  Studies  (HumRRO,  1937-1938),  examined  ef¬ 
fective  combat  soldiers  during  the  Korean  conflict.  They  found  that  a 
tighter  (warrior)  tended  to  (1)  be  more  intelligent,  (2)  be  a  "doer", 

nave  greater  emotional  stability,  (4)  have  better  health  and  vitality, 
and  (5;  have  a  greater  fund  of  military  knowledge.  In  1979  and  1980  studies, 
Anderson  found  that  successful  combat  leaders  were  more  intelligent,  task- 
oriented,  had  higher  morale,  and  had  more  direct,  job-related  experience  than 
tneir  less  successful  peers. 

An  historical  analysis  by  the  Department  of  History,  USMA  (1984)  found 
that  successful  combat  leaders  had:  (1)  terrain  sense,  (2)  single-minded 
tenacity  -  moral  courage  (3;  ferocious  audacity  -  willing  acceptance  of  rea¬ 
soned  risk,  l 4 )  physical  confidence,  and  (3)  practical,  practiced  judgement  - 
common  sense. 

Based  on  these  studies  the  hero  may  be  selected  in  peacetime  based  on  his 
intelligence,  moral  courage,  character,  mental  and  emotional  health,  physical 
well-being  (mc-,,-al  and  athletic),  decision-making  ability,  common  sense,  and 
se i f-conf idence . 

Conclusions 

Ihis  paper  has  presented  a  simple,  empirical  investigation  to  determine 
tne  operat'orial  definition  of  military  heroism.  It  has  corroborated  this 
definition  with  findings  from  other,  related  research.  Though  current  psy- 
cnological  literature  has  little  information  concerning  the  psychological 
profile  of  a  hero,  there  are  ce'  tainly  indications  from  the  analysis  of  a 
different  body  of  literature  that  military  heroism  is  readily  recognizable 
for  others  and  that  the  hero  has  certain  measurable  characteristics  that 
distinguish  him  from  common  humanity.  We  have  shown  the  underlying  factors 
of  heroism  and  in  conjunction  with  related  research  propose  that  certain 
factors  may  be  measu  e  an  used  to  predict  military  heroism.  Since  tills 
study  is  the  first  of  its  kind,  there  are  admitted  imperfections,  but  it 
demonstrates  conclusively  that  heroism  has  a  psychological  meaning  for  the 
average  individual  which  may  be  scaled  and  reliably  measured. 
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ANALYSIS  OF  FUNCTIONAL  TRAINING  PROGRAMS 
AT  THE  NAVAL  SUPPLY  CENTERS 


Neal  R.  CROWLEY 
U .  S.  Naval  Audit  Service 


In  fiscal  1980-1981  $223,649,000  was  misplaced,  lost,  or 
stolen  from  the  Naval  supply  centers  and  management  indicators 
tracking  worker  performance  declined.  Congressional  concern 
resulted  in  several  General  Accounting  Office  (GAO)  audits  and 
numerous  recommendations  for  improvement.  In  fiscal  1982 
performance  indicators  improved  and  $48,974,000  in  misplaced 
material  was  located. 

Responding  to  strong  Congressional  and  GAO  criticism  for 
poor  inventory  accuracy  the  Naval  Supply  Systems  Command 
(NAVSUP)  instituted  several  programs  to  improve  the  performance 
of  warehouse  workers.  As  part  of  the  overall  effort,  NAVSUP 
started  a  functional  training  effort  called  Competency  Based 
Certification  (CBC) .  The  CBC  program  increased  training 
staffs,  improved  training  facilities,  provided  money  for  more 
training,  required  the  supply  cen-.ers  to  develop  job  related 
training  materials  for  their  warehousing  functions  and  to  fully 
train  all  employees.  CBC  required  supervisors  to  "certify" 
that  employees  could  perform  their  jobs. 

Scope  of  the  CBC  Program 

Approximately  1,500  warehouse  workers  at  seven  supply 
centers  received  79,776  hours  of  CBC  training  from  1  August 
1982  until  31  December  1984.  NAVSUP  spent  almost  $3,500,000 
during  this  time  for  contractor  developed  CBC  training 
materials  and  $500,000  for  classroom  renovation,  printing, 
etc.  NAVSUP  hired  45  new  training  employees  to  implement  the 
program  and  issued  a  new  .pj.2 , 000 , 000  training  contract  because 
of  management's  belief  in  the  success  of  the  program. 

CBC  training  is  very  expensive.  Contractor  costs  alone 
exceed  $5,000,000  so  far  for  a  per  person  cost  of  over  $3,333. 
79,776  hours  of  CBC  training  equals  slightly  over  53  hours  per 
employee.  NAVSUP  paid  an  additional  $970,000  for  employee 
salaries  while  they  were  being  trained  and  almost  as  much  for 
the  salaries  of  program  administrators  and  instructors.  Spread 
over  the  27  months  of  the  program  studied,  53  hours  of  CBC 
training  gave  each  employee  an  average  of  two  hours  per  month. 

Currently,  more  hours  go  into  administering  the  program  and 
developing  training  materials  than  go  into  employee  training. 

There  are  two  training  programs.  One  is  based  on 
contractor  and  supply  center  training  materials.  Another  is 
physical  distribution  training.  This  consists  primarily  of. 
greatly  intensified  training  along  traditional  lines. 

Employees  go  to  existing  training  courses  ranging  in  size  from 
one  hour  briefings  to  six  month  professional  training  courses. 
The  supply  centers  conducted  226,429  hours  of  physical 
distribution  training  at  a  salary  cost  of  $2,750,000.  The  cost 
of  travel,  per  diem,  tuition,  materials,  etc.,  for  physical 
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distribution  training  was  approximately  $500,000. 

Administrative  costs  were  small. 

This  evaluation  concentrates  on  the  time  spent  on  training 
and  its  effect  on  worker  quality,  timeliness,  and 
productivity.  Has  the  CBC  program  worked?  To  find  out  I 
analyzed  the  program  at  all  seven  supply  centers.  The  basic 
question  is  whether  the  amount  of  training  is  related  to  supply 
center  performance  as  measured  by  management  indicators.  If 
training  changed  performance,  then  the  change  must  show  up  in 
performance  and  productivity  indicators. 

Although  assessing  the  effectiveness  of  training  is  the 
main  objective,  I  also  evaluated  the  relative  efficiency  of  the 
two  training  programs. 

The  primary  hypothesis  is  that  there  is  a  strong,  or  at 
least  moderate,  statistical  relationship  between  training  and 
performance.  A  secondary  hypothesis  suggests  that  the 
relationships  between  quality,  timeliness  and  productivity  can 
be  explained  or  depicted  in  a  logical  and  statistically 
significant  way. 

Variables  Used 

I  conducted  a  time  series  analysis  looking  for  inter¬ 
relationships  between  nine  variables:  quality,  timeliness,  both 
quality  and  timeliness  (Q  &  T) ,  work  unit  productivity,  receipt 
and  issue  productivity,  overtime,  each  type  of  CBC  training, 
and  a  combination  of  all  training.  The  quality  variable 
measures  how  accurately  warehouse  workers  perform  their  jobs. 
The  timeliness  variable  measures  how  quickly  workers  process 
materials.  The  Q  &  T  variable  measures  both  accuracy  and  speed 
of  worker  performance.  Two  variables  measure  worker 
productivity  and  one  measures  overtime. 

I  used  four  types  of  analysis:  first  looking  for  a 
statistically  significant  change  in  performance;  second 
evaluating  the  change  by  comparing  the  seven  supply  centers; 
and  linear  and  multiple  regression  analysis  for  each  supply 
center . 

How  I  Computed  the  Variables 

Three  variables  measure  the  accuracy  and  speed  of  warehouse 
worker  performance.  The  means  of  five  indicators  make  up  the 
quality  variable  for  worker  performance.  The  means  of  six 
indicators  make  up  the  timeliness  variable  for  worker 
performance.  The  composite  mean  for  all  eleven  indicators 
makes  up  the  Q  &  T  variable.  I  took  work  unit  and  manhour 
statistics  from  official  documents. 

Threats  to  Internal  Validity 

Several  factors  threaten  the  internal  validity  of  the 
analysis.  Norfolk  changed  its  receipt  and  issue  testing 
procedures  to  stop  counting  previously  recorded  errors.  NAVSUP 
program  managers  caught  two  supply  centers  fudging  results. 
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Cheating  may  be  widespread.  Some  data  is  missing.  But, 
overall,  the  data  used  appears  adequate. 

Changes  in  procedures,  policies,  and  equipment  take  place 
constantly  at  each  supply  center.  The  use  of  seven  supply 
centers  as  comparison  groups  helped  in  isolating  the  probable 
cause  of  changes. 

The  turnover  rate  varies  at  each  supply  center  and  this 
undoubtedly  influences  performance.  Unfortunately  NAVSUP  does 
not  keep  turnover  rates  by  department  or  type  of  work. 

Small  Sample  T  Test 

I  used  a  standard  small  sample  t  test  to  evaluate  the 
change  in  performance  before  and  after  training  for  the  six 
non-training  variables.  The  null  hypothesis  is  that  the 
difference  between  the  means  before  and  after  training  is  due 
to  chance.  The  alternative  hypothesis  is  that  a  significant 
change  took  place.  On  the  basis  of  a  one-tailed  test  at  the 
.01  level  of  significance,  I  would  reject  the  null  hypothesis 
if  a  t  were  greater  than  t.99,  which  for  12+9-2  =  19  degrees  of 
freedom  is  2.539.  On  the  basis  of  a  one-tailed  test  at  a  .05 
level  of  significance,  I  would  reject  the  null  hypothesis  if  t 
were  greater  than  t.95,  which  for  19  degrees  of  freedom  is 
1.729. 

Overall,  there  is  a  significant  difference  between  the 
means  before  and  after  training.  I  therefore  rejected  the  null 
hypothesis.  I  evaluated  the  total  change  in  performance  for 
each  supply  center  and  conducted  linear  and  multiple  regression 
analysis  to  see  if  the  change  is  related  to  training. 

Cumulative  Change  in  Performance 

I  computed  the  cumulative  change  in  performance  since  the 
CBC  program  began  for  each  supply  center  and  then  used  linear 
regression  analysis  with  the  change  in  performance  as  the 
dependent  variable  and  training  as  the  independent  variable.  A 
minimum  of  F=6.61  is  required  for  a  confidence  interval  of 
95%.  F  Test  results  have  five  degrees  of  freedom. 

The  correlation  coefficients  between  CBC  training  and 
changes  in  performance  are  very  low.  CBC  training  did  not  come 
close  to  passing  the  F  test  for  any  performance  indicator.  The 
hypothesis  requires  that  the  more  training,  the  greater  the 
improvement  in  performance.  The  null  hypothesis  maintains  that 
training  and  changes  in  performance  are  unrelated.  This 
analysis  fails  to  reject  the  null  hypothesis  for  CBC  training. 

Physical  distribution  training  shows  a  much  stronger 
correlation  with  changes  in  performance  than  CBC  trainmq 
does.  However,  the  only  relationship  with  training  that  is 
statistically  significant  at  the  95%  probability  level  is 
receipt  ana  issue  productivity.  Overall,  the  combination  of 
CBC  and  physical  distribution  training  shows  a  stronger 
correlation  than  either  of  the  two  alone,  especially  for 
timeliness  and  Q  &  T. 
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There  is  a  statistically  significant  inverse  relationship 
between  overtime  and  quality.  Even  the  corrected  coeficient  of 
determination  is  an  impressive  .79.  The  T  test  result  is  -4.8 
giving  it  a  probaDility  of  less  than  .005  that  the  result  is 
due  to  chance. 

Overtime  appears  to  be  more  directly  correlated  with 
performance  than  training.  As  the  amount  of  overtime  declines 
the  quality  of  performance  improves.  Inversely,  as  the  amount 
of  overtime  increases  quality  declines.  The  relationship 
between  overtime  and  timeliness  is  mixed. 


Linear  Regression  for  Each  Supply  Center 


I  conducted  linear  regression  analyses  and  obtained  the 
following  results. 

Jacksonville's,  Norfolk's  and  Oakland's  F  test  scores  for 
CBC's  correlation  with  receipt  and  issue  productivity  exceed 
the  5.59  required  for  a  probability  of  95%.  No  other  CBC 


correlation 
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ere  may  be  a  causal  relationship 


between  changes  in  productivity  and  CBC  training,  that  is,  the 
more  training,  the  better  the  performance  for  Jacksonville, 
Norfolk,  and  Oakland  but  not  for  Charleston,  Pearl  Harbor, 

Puget  Sound,  or  San  Diego. 

Puqet  Sound  shows  a  significant  correlation  between 
physical  distribution  training  and  quality.  Jacksonville  shows 
a  significant  relationship  between  physical  distribution 
training  and  receipt  and  issue  productivity. 

akland  shows  a  strong  correlation  with  work  unit 
productivity  and  Puget  Sound  snows  a  strong  correlation  with 
quality.  Overall  there  was  a  slight  improvement  with  CBC 
training  removed  except  at  Norfolk  which  shows  a  slight  decline 

Overtime  shows  the  strongest  correlations  with 
performance.  Norfolk's  overtime  is  related  to  timeliness, 
receipt  and  issue  productivity,  and  work  unit  productivity  with 
a  probability  of  error  of  less  than  .05%.  By  far  the  strongest 
relationship  (a  probability  of  99%)  is  at  San  Diego  for 
timeliness,  and  Q  &  T. 


General  Trends 


General  trends  are  mixed.  Overtime  shows  an  overall 
negative  correlation  with  performance  but  a  mixed  negative  and 
positive  correlation  with  productivity.  Overtime  work  hours, 
result  in  higher  production  but  overtime  also  increases  the 
overall  number  of  hours  required  to  do  a  job  and  results  in 
io«er  productivity  when  the  relative  portion  produced  during 
overtime  falls  below  tn>:  relative  portion  produced  during 
regular  work  time. 

Both  productivity  me  .-cures  show  a  mixed  negative  ana 
positive  correlation  with  all  performance  indicators  and  a 
positive  correlation  witn  timeliness  indicators.  Work  unit 
productivity  also  shows  a  positive  relationship  in  this  area. 
CBC  and  physical  distribution  training  show  a  weak  overall 


qjalitv,  and  timeliness 


inverse  relationship  witn  productivity, 

Multiple  Regression 

I  conducted  57  multiple  regression  analyses  for  each  supply 
center.  Mixed  results  naxe  1  nternretat ion  difficult  but  do 
sjjaf-r*  s^me  important  relationships. 

Tne  dependent  vananles  were  quality,  timeliness,  0  &  T, 
ar.l  the  two  measures  of  productivity.  The  strongest 
uorr  ’at  ion  overall  exists  between  overtime  and  training  as 
i  nd*.;  ■•■•n  ient  vr.riarler.  and  performance  measures  as  the  aependent 
variables.  Productivity  (or  the  amount  of  work  done)  when  used 
as  an  independent  variable,  is  negatively  correlated  with 
quality  and  positively  correlated  with  timeliness,  and  Q  &  T. 
Training  is  positively  correlated  with  performance,  except  at 
Cavland  where  the  relationship  is  negative  for  all  training  and 
Puiet  Sound  where  the  relationship  is  nagative  for  CBC  training 

Ov-rtime  snows  a  strong  inverse  relationship  to  performance 
indicators.  In  tne  case  of  quality  this  makes  sense.  It 
suggests  the  more  hours  people  work  the  more  mistakes  they 
maxe.  It  is  net  clear  why  timeliness  and  productivity  have 
ir.vv.s-  relat  locum  p„-  with  overtime.  fbxt  to  overtime,  all 
t :  i n l r  :  snows  the  strongest  correlation  with  various  dependent 
CV-rall,  it  is  more  »  f *  out  i  v»  than  either  CBC  or 
:  .  •  1  :'.‘t!i.  .tion  training  a  ice,  CBC'c  inverse 

r  1  *  . .  n  rp~  ;,j  a  problem  with  program  adn  l  n  l  s  t  r  at  ion . 

"■  f.fo?  *:<■’  training  was  poorly  planned  and  executed  and 
k.y-  r  s-  ly  affect^  i  performance.  Tn*-  time  devoted  to  CBC 
•:.k  i  r  :  •;  iy  p  ,v»  i---tiac'ed  1 1 per  formance  and  productivity. 

.t  .mini’s  rt:  v“.y‘st  influence  is  on  productivity.  Four 
'  *-:•••  nnv'ical  d  i  s  t  r  l  nut  ion,  and  six  combined  training  t 

:•  t  '  .  >;;-‘C  t.ne  rev-.l  of  significance  when  used  to  test 

•  :■  la-mnsrup  wit.n  5.  r  ouuct  l  v  i  ty .  Cr.e  additional  CBC,  one 

:  ■  1  .ni  k  'tt  lbut  ion,  and  one  cornu  mod  test  were  negative  at 

IV  v  .1  '  •  V*  7  r*  1  . 

••  *:  •  :  •  <>f  tram .  nu  c  nr-.-lat;  with  timeliness.  There  is 

*:  r  r e  1  a 1 1  >.n.m  1  p  between  prouuctivity  and  timeliness. 

Civ.  r  t  m  >  1.-  1 ,  ro  nhly  ccr  re  1  3t°d  with  timeliness.  Puget 

J  in  i  a;  o<  ars  to  have  a  strong  physical  distribution  training 
(r  eras.  Th  1  could  be  responsible  for  improvements  in 
pr  i  f  r  1  ;t  jie-e .  other  than  Puget  Sound,  and  discounting  Oakland's 
1  '■  r  '  •  re  Kit  pir  r.  1  p ,  there  is  noc  a  significant  correlation 
r  •  t  w  o  e  [  1  training  «nd  quality  or  timeliness  of  performance  for 
a  -  ;  ..p:  y  cen*-  e  r . 

/•  t  I’r.ar  1  <■  ~t  on  pr  ou  v;t  1  v  1 1  y  is  1  rr’.- r  no  j related  to  quality 
an  :  '  v-'rtin.e  1 inversely  related  to  w-tk  unit  productivity. 

Tne  ,  1  un  1  f  1  cant  result  fo  Jacksonville  is  between 

trniM  M  and  pi  o  1  ic  1 1  v  1 1  y  .  Nwrfolx  show^  an  inverse 
re;  ,»  i  >  ip  p  overt  i  me  and  prodirtivity  and  a  positive 

>  •  •  1  i  *  1  .  1  p  t  •  - 1  w  on  o.*or  tin..*  and  t  mm  ;  1 1  CBC  training  is 

,1 1  :  1  ‘  ly  r.’latep  to  pr  «,d  )■'  *•  1  v  1  ty  and  n  t,  T. 


jCjr.'i  's  tne  relationship  between  training  and  performan 
n*ral*y  negative,  especially  fot  CBC  training.  Overtime 
ntively  related  to  quality.  Experienced  workers  receiv 
m?  and  tney  make  fewer  mistakes  than  the  overall 
eree.  Oakland's  workload  declined  slightly  over  the  las 

'.’Pgr  Ct 

•  i r  1  .bar:  or ’s  overtime  is  inversely  '"elated  to 
ctivity.  Workers  may  oe  less  efficient  when  working 
>i,:  i  do  not  produce  as  nMcn  as  tney  would  nave  in  a 
:  lv  period  during  normal  wording  hours.  Nothing  else 

Meant  t  c, :  r  e a  r  1  Ha  r  :>o  r  . 

?  H.net  Sound  CfaC  training  negatively  impacted  quality  a 
me  sc.  P.nysical  a  l  s  tr  l  cut  ion  training  positively  impact 
ty  ar.d  productivity.  Productivity  is  correlated  with 
mess,  and  O'  &  T.  The  inverse  correlation  between 
i~>  uni  timeliness,  a  net  Q  &  T  is  very  strong, 
an  Diego  is  the  only  supply  center  that  does  not  show  so: 
'  f  relationship  with  either  measure  of  productivity  as  a 
r“ r.  -  V3r  isM  °  . 

Summary 

a-*  premise  of  CBC  is  that  training  will  improve  work 
M,  timeliness,  anu  productivity.  Proper  training 
’es  worker  Skill*  and  attitudes,  resulting  in  a  more 
at  le  on-the-job  behavior,  resulting  in  improved  accuracy 
mess  and  productivity.  The  analysis  shows  that  a 
'Mcally  significant  change  in  performance  occured,  but 
r jbaoly  not  caused  by  CBC  training.  At  Puget  Sound 
:m!  distribution  training  appears  to  have  conti ibuted  to 
Movement  in  performance  and  productivity. 
m  analysis  presented  in  this  paper  does  support  some 
3.  Overall: 

CBC  training  may  not  be  related  to  performance; 

.  physical  distribution  training  is  positively  r<  lated  * 
c  (,  t,  and  receipt,  and  issue  productivity; 
overtime  is  inversely  related  to  quality. 

•silts  differ  for  each  supply  center  but  nowhere  does 
1  nn  appear  to  have  significantly  impacted  warehouse  worki 
r ma nee . 

Jacksonville  physical  distribution  training  and  CBC  ma^ 
1  at  eg  to  productivity.  But,  .acksonvi lie  '  s  workload 
r-e-j  while  the  number  of  workers  remained  the  same 
:  mg  in  an  increased  output  per  worker. 

.  Norfolk  CBC  may  be  positively  related  to  productivity. 

ik's  workload  declined  while  the  number  of  workers 

»od  .'table  resulting  in  an  increase  in  productivity  per 

Oakland  training  is  negatively  related  to  all 
it  iiif-e  arid  overtime  may  be  positively  related  to  quality 
Harbor  and  Charleston  show  no  relationships  at  all  with 
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Cvraissioned  officers  are  asked  to  respond  to  three  questions  in  addition 
t  i  or  idling  dnt  i  on  their  nedic.nl  or  dental  specialty  and  the  type  and 
location  of  the  facility  to  which  they  are  presently  assigned.  Questions  for 
co ni ss i  vied  officers  are  is  follows: 

•  "Should  the  technician  perform  tins  task?" 

•  "If  yes,  how  well  were  technicians  prepared  to  perform  this  task  prior 

t)  job  entry?"  [Responses  are  nade  for  extremely  low  (preparation 
level )  to  extremely  high  (preparation  level)  on  a  scale  ranging  from 
1  n  8.) 

•  "Ac  whit  level  is  training  recommended  —  fani  li.arixat  ion  only  or 

'.lands  -  ?n/ thorough  knowledge? " 

for  tile  list  piestion,  the  following  explanation  is  provided  (WOI)AC,  1982): 

F  mi  1  i  ir  i cat  ion  means  inform.it  ion  should  be  provided  on  basic  facts, 
conpvioUs,  capabilities,  etc. 

Maiids-0../T'nor  nugh  Knowledge  means  trai  ning  should  he  provided  which 
i  ic 1  ides  f ni li irixat ion,  but  goes  further  with  actual  or  simulated  hands-on 
price  ice  or  in-depth  knowledge  requiring  judgment  or  application  of  theory. 

Resp  vises  crom  the  enlisted  personnel  provide,  by  total  and  by  subgroups, 
the  n.MV'Ontige  of  technic ians  performing  each  task  and  data  from  which  a 
risk  lifficulty  index  can  be  derived.  Oat i  from  the  commissioned  officers 
prn’ile,  by  total  and  by  lubgroups: 

•  Percentages  of  responders  who  believe  i  task  should  or  should  not  be 
per f o rmod • 

•  Inform!  ion  fnn  which  a  task  ef  f  ec  t  i  vonoss  index  can  he  derived. 

•  File  level  to  which  each  task  should  be  trained  ( tami  liar  i/.at  ion  only  or 
h  inds-nn  ''thorough  Knowledge) . 

\\’  \i.YS  (  ■»  »K  |\d<  TRAIN  I  NS  SURVEY  OATA 

•  'iip  it  *r  nriilo'its  on  l  isk  triiniig  survey  response  dati  prepared  by  NODAC 
'■  •  ■  ir  m  ill.'  ryvi  >w.  i  ni  smairi/.ed  by  ll*>K  PC  educition  specialists.  Each 

1  i  '  r  t ;  ii  i,  •,  irv  ■/  soli ’its  comment  ;  in-1  sigvsted  a.iditional  tasks  from 
'ni  - . .  Th<--;  >  ic-  ill)  r  .-viewed  la  I  srvniri'.ed.  Specific  criteria  are 
i  p  - 1  i  >1  1  >  <  h  ■  ,  i  tm  tries  to  del  *r  ii  whether  t  isks  in-/.*  a  high  or  low 

-  -’ii.;  )  r  ‘drill  s-hool  'rulin’,  i  id  t  >  elis.ify  selected  tasks  according 
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Training  Effectiveness!  Learning  Difficulty  Of 
Tasks  Recommended  For  Training 
Cardiopulmonary  Technicians 
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or  revision  determine  the  type  of  curriculum  revision  mandated  by  CCX.  Three 
types  of  rev* 'ions  have  been  identified  as  follows  (CNET,  1981): 

*  Tv ne  A--Claanges  in  course  length,  objectives,  and  subject  matter  to 
such  an  extent  that  logistic  support,  personnel  allocations,  funds, 
and  the  like  are  affected.  Require  project  plans  and  HSETC  approval. 

*  Type  B — Mod  if icat tons  within  the  established  structure  of  the  course, 
including  major  rescheduling  of  topics,  time,  or  revision  of 
instructional  procedures.  Require  a  Plan  of  Action  and  Milestones 
developed  in  conjunction  with  the  formal  school(s). 

•  Type  C — Minor  changes  such  as  correction  of  clerical  errors;  insertion 
of  titles  or  designations  of  fiims,  publications,  and  equipment;  minor 
adjustnents  in  time  allocations;  and  addition  of  learning  activities. 
Require  HSU  TO  executive  correspondence  setting  forth  minor  course 
changes  and  tentative  completion  dates. 

REVIS [ON  PLANNING  DOCUMENT 

Cooper  it  ively ,  the  formal  schools  and  HSETC  determine  the  resource  con¬ 
straints  that  wiLl  affect  effective  and  timely  completion  of  the  curriculum 
revision.  The  type  of  revision  (A,  B,  or  C),  the  number  and  experience  level 
of  instructors  for  the  program  under  review,  whether  the  program  is  offered  at 
a  single  ar  nultiple  training  site(s)  and  the  type  and  number  of  curriculum 
support  personnel  at  the  training  site(s),  and  the  number  of  programs  in  the 
various  stages  of  OCR  process  help  determine  resource  constraints.  This  deter¬ 
mination  affects  the  level  of  responsibility  assigned  to  the  formal  school(s) 
or  issumed  by  USE  TO  for  the  design  and  development  phases.  HSETC  education 
specialists  and  school  personnel  estimate  completion  dates  for  these  two  phases. 

D  ESP'S  HR  SC 

\t  the  design  phase,  which  is  actually  a  redesign  of  the  curriculum,  HSETC 
an  1  the  schools  conslier  task  learning  difficulty  and  training  effectiveness 
dita  resulting  from  the  analysis  phase.  The  course  review  conducted  prior  to 
this  phase  provides  a  list  of  learning  objectives  to  be  deleted  from  the  course 
and  i  list:  of  learning  objectives  to  be  revised.  Also  provided  is  a  list  of 
selected  tasks  to  be  added  to  the  course.  After  deleting  learning  objectives 
not  supportive  of  tasks  selected  for  training,  the  formal  schools,  with 
assistance  from  HSETC  develop  newly  required  objectives.  This  process  includes 
revising  existing  learning  objectives,  where  appropriate,  or  writing  new 
learning  objectives  for  selected  tasks  not  currently  provided  for  in  the  cur¬ 
riculum.  Student  assessment  procedures  are  developed  for  all  new  or  revised 
learaiog  objectives.  Both  performance  checklists  and  criterion-referenced  test 
items  are  created  or  redesigned  for  evaluation  of  students.  Finally,  the 
sequence  and  structure  of  the  entire  curriculum  are  determined.  These  are 
tentative  decisions  on  the  time  required  for  training  students  to  master  the 
tasks  selected  for  training  and  the  point  or  place  in  the  training  program 
where  each  learning  objective  wilL  occur.  Necessary  adjustments  in  sequence  and 
structure  may  follow  a  pilot  phase  of  the  revised  curriculum. 

DEVELOP  PHASE 

By  reviving  appropriate  previously  used  training  materials  and  the  revised 
curriculum  outline,  the  formal  schools  with  HSETC  assistance: 

•  Specify  learning  strategies. 


R 


(N 


•  Calculate  resource  requirements  of  tune,  manpower,  and  cost  to  imple¬ 
ment  revisions,  and  request  resource  approval  from  higher  authority 
if  Type  A  revision  is  required- 

•  Review  existing  materials. 

•  Develop  revised  instruction. 

•  Validate  revised  instruction. 

\s  a  result,  program  directors  and  instructors  of  the  training  program 
und**r  review  will  have  revised  lesson  topic  guides  and  supporting  materials 
available  for  students.  These  documents  will  be  used  in  the  pilot  phase  to 
validate  instruction.  A  validation  report  is  then  submitted  by  the  formal 
schools  and  a  course  approval  letter  is  provided  by  I1SETC.  if  subsequent 
ilditions  or  changes  are  deemed  necessary,  higher  authority  action  on  a  request 
for  resource  allocation  is  obtained. 

r  i  P  I,  F,  M  E  N  T  /CO NT RO L  PHASES 

The  final  two  phases  are  combined  to  implement  the  revisions  and  evaluate 
the  revised  curriculum.  The  formal  schools  conduct  the  training  of  students 
usiag  iay  adjustments  to  the  curriculum  found  necessary  during  the  piLot  or 
validation  phases  of  instruction,  while  HSETC  monitors  the  outcome-  The 
eviluation  or  control  phase  strategy  uses  a  course  evaluation  plan  with  course 
review  checklists.  All  evaluative  data  are  analyzed  and  necessary  changes  are 
identified.  if  changes  are  requtr-d  as  a  result  of  the  evaluat’on  summary 
report,  the  schools  submit  to  HSETC  a  request  to  approve  minor  curriculum 
changes.  At  this  point,  the  first  cycle  of  the  curriculum  review  is  completed. 


SUBSEQUENT  CYCLES 
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Subsequent  cycles  will  occur  but  will  require  far  less  Lime  because  of  the 
work  accomplished  during  the  first  cycle.  Specifically,  tasks  lists  requiring 
only  minor  revisions  will  be  available.  Learning  objectives  wLLL  be  task  based, 
and  only  aew  objectives  necessary  to  support  additional  tasks  selected  for 
formal  school  training  during  subsequent  cycles  will  need  to  be  developed. 
Further,  the  ongoing  evaluation  plan  and  a  minor  curriculum  changes  procedure 
will  keep  each  curriculum  up-to-date.  Finally,  after  the  initial  cycle,  less 
•ritical  specialty  courses  may  be  scheduled  for  a  CCR  on  alternate  cycles  rather 
than  every  cycle. 


SUMMARY 


The  Cyclical  Curriculum  Review  procedure  is  based  on  a  systems  approach  to 
curriculum  development.  It  uses  subject  matter  as  well  as  process  experts  to 
select  tasks  for  trailing  based  on  state-of-the-art  content  that  wi LI  produce 
highly  competent  medical  and  dental  technicians.  Through  the  CCR  process,  the 
ef l eel i V'uaess  of  these  -.'clinicians  in  meeting  the  needs  of  the  Navy  is  en.  ired. 
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Computer  Aversion  as  a  Source  of  Bias  in  Computerized  Testing 

Jo  Anna  Wood  and  Gordon  F.  Pitz 
Southern  Illinois  University,  Carbondale 

Tests,  conventional  and  computerized,  are  a  pervasive  aspect  of  our  lives. 
They  are  used  as  part  of  placement  and  selection  procedures  for  employment, 
training  and  educational  opportunities.  In  all  these  instances,  test  scores  are  used 
to  classify  individuals  on  the  basis  of  ability  or  aptitude,  as  measured  by  the  test. 
When  tne  testing  procedure  itself  interferes  with  the  measurement  of  relevant 
phenomena  in  a  non-random  way,  that  procedure  will  discriminate  against  certain 
test-takers,  by  incorporating  an  irrelevant  attribute  into  the  classification  process. 

A  growing  body  of  literature  (e.g.,  Lawton  &  Gerschner,  1982;  Naiman,  1982; 
Nickerson,  1981)  suggests  that  a  person's  beliefs  and  attitudes  about  human- 
computer  interactions  will  affect  that  person's  ability  to  interact  with  a  computer. 
The  term  computer  aversion  has  been  adopted  by  Meier  (1984)  to  refer  to  such 
negative  oehefs  and  attitudes.  Computer  aversion  can  hamper  one's  performance  on 
computerized  tasks  such  as  data  input,  word-processing,  or  using  databases.  It 
seems  likely  that  similar  effects  would  be  found  when  the  task  is  a  computerized 
test. 


Tne  research  presented  in  this  paper  is  an  attempt  to  address  systematically 
some  shortcomings  of  published  findings  on  computerized  testing  and  computer 
aversion,  as  well  to  provide  some  validity  data  for  a  measure  of  computer  aversion. 
The  hypotheses  to  be  tested  concerned  two  dependent  variables,  state  anxiety  and 
test  performance  (number  of  errors).  It  was  expected  that  computer  aversion  would 
negatively  bias  the  results  obtained  in  computerized  testing.  Further  bias  was 
expected  when  the  testing  program  was  "unfriendly"  or  difficult  to  use.  In 
addition,  this  research  was  designed  to  determine  whether  computer  aversion  is 
different  from  two  possibly  related  concepts,  test  anxiety  and  trait  anxiety. 

Methods 


Subjects 


Subjects  (N=92)  were  recruited  from  non-college  student  adult  populations  in 
tne  Alton  and  Carbondale,  Illinois  areas.  Subjects  were  recruited  from  populations  of 
hospital  in  the  two  areas.  Only  English-speaking  subjects  were  recruited  to  avoid 
confounding  test  results  with  language  abilities. 

Tests  and  other  measures 


Verbal  and  math  questions,  were  selected  from  published  SAT  tests  (College 
Entrance  Examination  Board,  1983).  Two  tests  were  constructed,  each  composed  of 
two  math  and  three  verbal  subtests.  The  tests  used  a  multiple  choice  format,  and 
were  timed.  Each  test  was  developed  for  both  paper-and-pencil  and  computer 
administration. 

Meier  (personal  communication,  January  5,  1985)  has  developed  a 
questionnaire  to  measure  computer  aversion  among  clinical  psychologists  (Computer 
Attitudes  Scale).  This  scale  was  modified  for  use  with  health  care  personnel  for 
use  in  the  present  study.  Tne  State-Trait  Anxiety  Inventory  was  used  to  obtain 
measures  indicative  of  participants'  levels  of  anxiety  during  testing  (  state  anxiety), 
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as  well  as  to  obtain  a  measure  of  general  (trait)  anxiety.  The  Achievement  Anxiety 
lest  (Alpert  £.  HaDer,  1960)  was  designed  to  measure  subjects'  perceptions  of  the 
extent  to  wruch  anxiety  is  either  facilitating  or  debilitating  of  test  performance. 

Computerized  test  administration  and  scoring  were  handled  by  computer 
programs  written  in  Pascal  and  developed  for  this  purpose.  Computerized  tests 
were  administered  on  a  16-bit  H3M-PC  compatible  machine. 

Procedures 


Suojects  were  randomly  assigned  to  either  a  friendly  or  unfriendly  program 
condition.  Total  allotted  time  was  the  same  for  both  conditions.  The  unfriendly 
progrant  forced  subjects  to  work  through  each  item  at  a  fixed  pace  and  required 
complex  responses.  These  demands  were  expected  to  cause  many  errors.  The 
friendly  pro-gran,  allowed  subjects  to  allocate  their  own  time  within  each  subtest, 
ana  used  simple  response  procedures.  Subjects  were  further  randomly  assigned  to 
receive  eitner  tne  paper-and-pencil  or  computerized  test  first.  Versions  of  the 
anilities  tests  were  randomly  assigned  as  either  paper-and-pencil  or  computerized 
administration  for  each  subject.  Random  assignment  was  handled  through  a 
computer  program,  with  the  only  restriction  being  an  equal  number  of  subjects  in  all 
conditions. 


Results 


Analysis  of  Covariance  (ANCOVA)  was  used  to  test  the  effects  of  program 
friendliness  and  test  mode  (be.,  computerized  vs  paper  and  pencil)  on  performance 
and  state  anxiety  while  controlling  for  the  effects  of  computer  aversion.  An 
additional  factor,  OLaer,  was  included  to  determine  if  the  dependent  measures  were 
influenced  by  the  order  in  which  the  test  modes  were  used. 

For  verbal  performance  the  significant  effects  of  interest  are  interactions  of 
Program  (friendly  vs.  unfriendly)  by  Mode  (computer  vs.  paper)  (F  (1,84)  =  8.69, 
P=.ClU4),  and  urder  (computer  first  vs  paper  first)  by  Computer  Aversion  by  Mode  (F 
(1,84)  =  5.6b,  p=.020).  The  former  effect  was  also  significant  in  the  analysis  of 
overall  performance.  The  latter  was  not;  overall  performance  included  scores  on 
n.atn  subtests  that  probably  represented  mostly  random  error. 

The  two-way  interaction  of  program  by  mode  is  shown  in  Table  1.  3oth  forms 
u£  tno  computer  programs  induced  poorer  performance  than  did  the  paper  and  pencil 
tests,  but  the  effect  was  greater  for  the  "unfriendly"  version. 

liit  significant  three-way  interaction  indicates  that  one  or  more  of  the  cells 
in  tne  design  matrix  differed  from  others  in  terms  of  the  relationship  between 
computer  aversion  and  performance.  Separate  regressions  of  performance  on 
cuiiputei  aversion  were  calculated  for  each  celL  As  shown  in  Table  2,  computer 
avepjon  was  significantly  and  positively  related  to  performance  on  computerized 
tests  sa$i]y  writ  n  those  tests  preceded  the  jxiper  and  pencil  tests.  Computer 
aversion  was  not  t elated  to  performance  on  paper  tests. 
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computerized  rest.  62 %  of  the  low  computer  aversive,  but  only  26%  of  the  high 
computer  aversives,  nad  scores  at  or  above  the  yrana  median. 

Tne  last  comparison  is  most  interesting  because  the  "friendly"  computer  test 
most  closely  approximates  computerizd  testing  procedures  that  would  be  used  in 
selection  and  classroom  testing  applications.  Altnough  it  has  less  overall  effect  on 
performance,  this  test  apjieared  to  be  more  biased  than  the  "unfriendly"  version, 
insofar  as  this  experiment  represents  computerized  testing  for  selection  purposes,  it 
apj'ears  that  computerized  tests  have  a  built-in  bias  against  computer  aversives. 

If  the  results  of  the  present  study  are  indicative  of  what  occurs  in  other 
computerized  testing  applications,  then  one  of  two  things  may  occur.  First, 
Computer  Aversion  may  be  related  both  to  computerized  test  performance  and  to 
performance  on  some  criterion  measure  (e.g.,  job  performance,  success  in  college). 

In  this  case,  using  computerized  tests  should  result  in  more  accurate  predictions  of 
the  criterion.  In  the  second  scenario,  Computer  Aversion  is  related  to 
computerized  test  performance,  but  not  to  performance  on  the  criterion  measure.  If 
this  is  true,  then  Computer  Aversion  acts  as  a  moderator  variable;  that  is,  it 
affects  the  relationship  between  computerized  test  performance  and  performance  on 
the  criterion,  and  does  so  differently  for  various  groups  of  subjects. 
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Table  1 


Cell  ineans  and  Standard  Deviations  for  Math  and  Verbal  Errors 


Friendly  Unfriendly 

Program  Program 


COMPUTE?  TEST 

Mean 

S.D. 

Mean 

S.D. 

Computer  First 

Math 

14.48 

(2.89) 

16.74 

(2.30) 

Verbal 

14.43 

(5.04) 

16.26 

(6.80) 

Paper  First 

Math 

15.13 

(3.68) 

16.26 

(2.65) 

Verbal 

12.87 

(4.59) 

19.22 

(5.48) 

PAPER  TEST 

Compute .  First 

Math 

13.17 

(3.76) 

13.78 

(4.06) 

Verbal 

12.43 

(5.67) 

12.04 

(6.67) 

,  1 

eaper  n rst 

M  a  l.  h 

14.48 

(4.24) 

14.48 

(4.24) 

j - 

Verbal 

13.83 

(6.40) 

13.83 

(7.69) 

"Cell  means  for  these  two  conditions  were  calculated  as  a  single 
group,  since  the  two  conditions  were  equivalent. 


Table  2 

Regression  Coefficients  for  Verbal  Errors  on  Computer  Aversion 


Friendly 

Unfriendly 

b 

beta 

b 

beta 

COMPUTER  TEST 

Computer  First 

.31 

.49* 

.52 

.54 

Paper  First 

.08 

.09 

.08 

.15 

PAPER  TEST 

Computer  First 

.  20 

.29 

.17 

.18 

Paper  First"*" 

.17 

.21 

.17 

.21 

NOTES:  (1)  Starred  items  are  significant  at  p<.01. 

(2)  b's  are  raw  regression  coefficients,  while  Beta's  are 
standardized  regression  coefficients. 

*"Tnese  two  groups  were  treated  as  single  cell  for  regression 


O' >i'  putonzt-u  i  of  tne  low 

<_vr  pater  no.  r.dj  score-  at  c: 

"no-  last  oonutrison  is  most  interesting  because  the  ’'friendly"  computer  test 
most  closely  approximates  computerize  testing  procedures  that  would  be  used  in 
.■election  and  clan'.roor:  testing  applications.  Altne-ugn  it  has  less  overall  effect  on 
:  vrf .r ::  ance,  this  test  apivared  to  be  more  Diasoo  than  the  "unfriendly"  version. 
Insofar  as  tnis  exivrunent  represents  coiT.puterized  testing  for  selection  purposes,  it 
ap; '-arc  that  computerized  tests  have  a  ouilt-m  bias  against  computer  aversives. 

If  tne  results  of  the  present  study  are  indicative  of  what  occurs  in  other 
con  putenzed  testing  applicatioas,  tnen  one  of  two  things  may  occur.  First, 
Computer  Aversion  may  be  related  both  to  computerized  test  performance  and  to 
ic-rformance  on  some  criterion  measure  ie.g.,  job  performance,  success  in  college). 

In  tins  case,  using  computenzeo  tests  should  result  in  more  accurate  predictions  of 
the  criterion.  In  tne  second  scenario,  Computer  Aversion  is  related  to 
co iii putenzed  test  performance,  but  not  to  performance  on  the  criterion  measure.  If 
this  is  true,  then  Computer  Aversion  acts  as  a  moderator  variable;  that  is,  it 
affects  the  relationship  between  computerized  test  performance  and  performance  on 
tne  criterion,  and  does  so  differently  for  various  groups  of  subjects. 
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computer  average,  but  only  25%  of  the  high 
above  the  grand  median. 
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Orientation,  surrogate  travel,  and  gender  differences 

in  videogame  strategy 

Sharon  TKacz 
Army  Research  Institute 

Rerearch  has  shown  that  individuals  vary  in  their  ability  to 
process  and  use  spatial  information  (Kosslyn,  Brunn,  Cave,  8 
ViVllach.  1983)  They  also  differ  in  the  fra.me  of  reference  they 
use  to  form  memory  representations  of  space  (Sholl  8  Egeth, 
1980)  Some  use  an  egocentric-  system  (e.g.,  right  or  left) 
whereas  others  use  a  topographic  system  (e.g..  north  or  south). 
Developmental  studies  (Fick  B  Reiser,  1982)  have  suggested  that 
individuals  adopt  the  topographic .  more  sophisticated  system  as 
they  mature. 

Piaget  has  suggested  that  physical  movement  through  the 
environment  is  how  spatial  reasoning  skills  are  acquired.  Goldin 
and  Thcrndyke  (1981)  support  Piaget's  arguments,  demonstrating 
that  navigation  through  space  provides  a  unique  kind  of  spatial 
knowledge,  procedural  knowledge,  that  cannot  be  acquired  simply 
by  reading  maps. 

The  fact  that  movement  leads  to  procedural  knowledge 
acquisition  has  been  applied  to  navigation  training.  Cohen 
11980)  has  shown  that  the  information  derived  from  actual  travel 
can  be  approximated  by  surrogate  travel.  In  some  cases, 
simulated  movement  may  be  even  more  effective  as  a  training  aid 
than  actual  navigation:  if  only  relevant  information  is 
presented,  irrelevant  information  cannot  be  distracting. 

Much  research  (Wittig  8  Petersen,  1979)  has  demonstrated 
gender  differences  in  spatial  information  processing,  including 
the  relationship  of  cognitive  variables  to  sex-role  identity. 
Since  previous  research  has  shown  that  individuals  tend  to 
conform  to  sex-role  expectations,  and  expertise  in  computers  and 
videogames  is  considered  masculine,  it  is  reasonable  to  predict 
that  females  may  not  perform  as  well  as  males.  Whether  lower 
female  performance  is  due  to  a  lack  of  cognitive  ability  or 
adherence  to  sex-role  expectations  has  not  been  established. 

The  research  described  below  investigated  navigation  through 
an  artificial  environment  created  by  a  microcomputer.  The  game 
required  players  to  use  a  topographic  reference  system  to 
indicate  directions.  In  addition  to  dependent  measures  derived 
from  the  videogame,  cognitive  components  assumed  to  unaerly  game 
per f orm. «.?!''  '  were  assessed  Cognitive  components  were  also 
examined  to  see  it  females  and  males  differ  m  the  basic 
e  oj,n  1 1 .  vf  ikillr  t  ecu  iron.  by  the  game 

Method 

On  >  hunc. • d  and  ninety  undergraduates  served  as 
p.rl  r 'puit'  Trov  w-u  o  adir.im  stei  od  several  psychometric  tests, 
■_nd  played  a  :  olios  of  eight  vidcogree;  requiring  them  to  escape 
iron  a  8  x  b  x  cubic  maze.  Players  moved  from  one-  room  to  the 
next  through  op-'r.ingr,  m  the  floors,  walls,  ar.d  ceilings  by 
typing  "r/  .  s  ,  “e",  “w" .  "u".  or  ’d"  for  north,  south,  cast, 
wist  u i  >i  down  ai. options  Two  types  of  mfoimation  available 
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were  current  Position  (P)  and  location  of  the  escape  room  or  Goal 
(G).  each  defined  by  x ,  v.  z  coordinates  Combinations  of  these 
two  types  of  information  formed  the  four  INFORMATION  CONDITIONS: 
PG.  P  ,  *  G .  or  Q 

In  the  PG  condition,  information  on  both  the  player's 
Position  (P)  and  the  Goal  (G)  was  provided  continuously  on  the 
screen.  Subjects  in  the  F  or  G  condition  had  only  one  kind  of 
information  available.  Those  in  the  Q  condition  had  no 
information  displayed  but  were  permitted  to  request  both  P  and  G 
coordinates.  All  participants  played  four  PG  games,  m  order  to 
familiarize  them  with  the  game  and  the  keyboard,  and  then  four 
more  games,  in  one  of  the  four  INFORMATION  CONDITIONS. 

Results  and  Discussion 

Males  showed  superior  spatial  performance.  (Table  1  shows 
means  for  all  psychometric  tests.)  There  were  no  gender 
differences  on  vocabulary  or  reasoning  tests.  These  data 
indicate  that  males  and  females  differ  significantly  in  the 
skills  they  bring  to  the  experiment  that  were  expected  to  underly 
game  performance. 


Table  1.  Gender  differences  in  psychometric  measures. 


Variable 

Female  Mean 

Male  Mean 

t 

E 

Abstract  Orientation 

101 

113 

3.4 

.001 

Map  Orientation 

8.5 

10.2 

2.7 

.01 

Figural  Reasoning 

22.6 

22.4 

.2 

* 

Mental  Rotation 

25.6 

27.7 

2.2 

.05 

Vocabulary 

55.3 

55.2 

0 

* 

Game  performance 

was  described  by 

ten  dependent 

variables 

derived  from  individual  key  presses 

for  each 

game 

SCORE 

indicates  the  total  time  to  complete  one  game.  RESPONSE  TIME 
indicates  mean  time  between  any  two  keypresses.  Similarly, 
STATIONARY  TIME  indicates  mean  time  spent  in  a  room.  EFFICIENCY 
is  a  ratio  of  the  minimum  distance  to  actual  distance  between 
starting  Position  and  Goal  room.  REORIENTATION  is  the  rate  (the 
number  of  times  per  minute)  that  players  changed  the  direction 
they  were  facing  SURFACE  RATE  is  a  measure  of  time  spent  in 
surface  rooms  of  the  cube.  Similarly,  INTERIOR  RATE  is  a  measure 
of  time  spent  m  interior  rooms.  VISIBLE  CRASH  indicates  how 
many  times  a  player  tried  to  go  through  a  wall  (visible  on  the 
screen)  that  did  not  have  a  door.  Similarly,  REAR  CRASH 
indicates  how  many  times  a  player  tried  to  go  through  the  wall 
beh:rd  them  (not  visible  on  the  screen).  Lastly.  ERROR  KEY  is  a 
measure  of  illegitimate  key  presses 

An  analysis  of  vai  lance  was  performed  on  the  ten  dependent 
me  •.surer  with  two  between-subjoets  variables  (INFORMATION 
CONDITION.  C  and  GENDER,  G)  and  or.'  wi  thin -subjects  variable 
( PRACTICE,  r)  SCORE  and  REORIENTATION  were  the  only  dependent 
measures  for  whi'di  any  GENDER  (G)  off  rt  was  obtained  (see  Table 
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Table  2. 


Analyses  of  variance  on  ten  videogame  measures. 


Dependent 

measure 

Source  of 
var 1  at  1  on 

F 

2 

” ~  — 

Score 

C 

76.3 

.001 

G 

11.6 

.001 

P 

15.3 

.001 

CP 

3.0 

.001 

GP 

2.9 

.05 

Response 

P 

55.3 

.001 

time 

Stat i onary 

C 

18 . 1 

.001 

time 

p 

30.0 

.001 

CP 

3.7 

.001 

Ef f lciency 

C 

64.8 

.001 

p 

2.9 

.05 

Reor i entat i on 

G 

7.1 

.01 

Surface 

C 

6.7 

.001 

rate 

P 

40.9 

.001 

CP 

2.7 

.01 

Interior 

c 

27.2 

.001 

rate 

CP 

1  .9 

.05 

Visible 

C 

11  .6 

.001 

crash 

p 

9.6 

.001 

G 

4 . 1 

.05 

Rear 

C 

51 .3 

.001 

crash 

P 

8 . 5 

.001 

CP 

5 . 1 

.001 

Error 

c 

9.2 

.001 

key 

p 

3.8 

.01 

CP 

2.5 

.01 

2).  Although  almost  all  measures  improved  with  PRACTICE  (P),  the 
GENDER  x  PRACTICE  interaction  (GP)  was  significant  only  for 
SCORE,  suggesting  that,  while  their  initial  scores  may  be  lower, 
females  may  show  greater  improvement.  The  effect  of  INFORMATION 
CONDITION  (C)  was  significant  for  all  variables  except  RESPONSE 
TIME  and  REORIENTATION,  indicating  that  the  rate  of  key  pressing 
and  the  rate  of  turning  is  independent  of  the  amount  and  type  of 
information  available.  Finally,  a  PRACTICE  -<  INFORMATION 
CONDITION  interaction  (CP)  was  obtained  for  several  measures 
SCORE,  STaT IONARY  TIME,  SURFACE  RATE,  INTERIOR  RATE,  REAR  CRASH, 
and  ERROR  KEY.  The  fact  that  thif  interaction  was  not  obtained 
foi  all  dependent  variables  indicates  that  improvement  m 


performance  is  not  simply  a  function  of  repeated  practice.  These 


game -derived  performance  measures  may 

be  indices 

of  individual 

d i f  f erences 

in  information-processing  capacity,  and 

not  subject 

to  practice 

of  f  ects . 

Means 

for  these 

dependent 

measures  are  shown 

in  Table  3. 

Here.  SCORE  and  REAR 

CRASH  were  the  only 

var i abl es 

for  which  any 

gender  dif 

ference  was  obtained,  in 

only  two 

INFORMATION 

CONDITIONS 

(G  and  Q) 

These  results  suggest  performance  is  very 

similar  f  o^ 

males  and 

females . 

Table  3.  Results  of 

t-tests  on 

videogame 

measures . 

INFORMATION 

Female 

Male 

Vanabl  e 

CONDITION 

Mean 

Mean 

t 

P 

Score 

PG 

82 

61 

1  . 197 

* 

Q 

142 

99 

1  .656 

* 

G 

414 

247 

3.073 

.01 

P 

639 

507 

1 .493 

* 

Response 

PG 

4.4 

4.0 

.608 

* 

t  i  me 

Q 

3 . 8 

3.7 

.  182 

» 

G 

3.5 

3.8 

.651 

* 

P 

3.7 

3.3 

.987 

* 

Stat i onary 

PG 

5.5 

5.1 

.457 

* 

t  i  me 

Q 

S .  8 

8.8 

.618 

* 

G 

4.7 

4.8 

.  170 

* 

P 

5 . 2 

4.6 

1 .075 

* 

Ef f iciency 

PG 

.60 

.59 

.212 

* 

Q 

.56 

.63 

1.246 

* 

G 

.21 

.21 

0 

* 

P 

.11 

.11 

0 

* 

Reorientation  PG 

.07 

1  . 16 

1.726 

t 

Q 

.09 

.47 

1.212 

* 

G 

.  10 

.31 

1  .243 

* 

P 

.09 

.20 

.808 

* 

Surface 

PG 

8.28 

8 . 52 

.213 

* 

rate 

Q 

5.95 

5.42 

.  581 

* 

G 

6.45 

6.90 

.623 

* 

P 

7.07 

8.83 

1 .872 

* 

Inter i or 

PG 

3.59 

3.68 

.  127 

* 

rate 

Q 

1.65 

2.26 

1  .954 

* 

G 

1 . 33 

1 .72 

1  .047 

* 

P 

1  .01 

1 .05 

.  119 

* 

Visible 

PG 

1 .72 

1 .49 

.572 

* 

crash 

Q 

.  85 

.88 

.  103 

* 

G 

3 . 84 

2.38 

1  .692 

* 

P 

3.02 

2.10 

1 .530 

* 

Rear 

PG 

.69 

.70 

.037 

* 

crash 

Q 

.48 

.21 

2.808 

.01 

G 

1 . 36 

1  .  28 

.293 

* 

P 

2 . 38 

2.47 

.224 

* 

Error 

PG 

.  11 

.20 

.976 

* 

key 

Q 

.40 

.52 

.661 

* 

G 

.08 

.  12 

.679 

♦ 

p 

.  13 

OO 

.  A 

.  584 

* 

Error 

key 


OO 


Data  in  Table  2  also  indicate  that  performance  varies  across 
INFORMATION  CONDITION  (C)  for  every  variable  except  RESPONSE 
TIME,  suggesting  that  different  strategies  are  employed  in 
diflerent  INFORMATION  CONDITIONS.  Rather  than  indicating 
different  difficulty  levels  of  the  same  game,  INFORMATION 
CONDITIONS  may  be  qualitatively  different  games,  from  a 
problem-solving  perspective.  That  is,  a  task  that  involves 
finding  a  goal  without  knowledge  of  your  own  position  may  not 
have  much  in  common  with  a  situation  where  your  position  is 
known . 

This  interpretation  is  supported  by  results  of  stepwise 
multiple  regression  analyses.  SCORE  was  predicted  from  the 
psychometric  measures,  shown  in  Table  1,  for  males  and  females 
separately,  and  for  males  and  females  combined,  for  each 
INFORMATION  CONDITION.  Table  4  shows  amount  of  variance  in  SCORE 
accounted  for  by  the  best  combination  of  two  psychometric 
predictors.  The  different  INFORMATION  CONDITIONS  have  different 
psychometric  predictors,  suggesting  differences  in  cognitive 
components . 

In  contrast  to  the  absence  of  gender  differences  in  Table  3, 
data  in  Table  4  indicate  that  components  of  performance  differ 
for  females  and  males.  This  difference  is  particulary  clear  for 
condition  Q.  Cognitive  correlates  of  female  performance  are 
vocabulary  and  reasoning,  neither  of  which  are  spatial  measures. 
Conversely.  the  best  predictors  of  male  performance  are  mental 
rotation  and  abstract  orientation.  Taken  together  with  the  data 
m  Tabic  1,  these  results  suggest  that  individuals  may  develop 
strategies  that  depend  on  their  own  skills,  rather  than 
strategies  that  are  task  dependent. 


Table  4.  Multiple  regression  analyses:  predicting  SCORE 
from  psychometric  measures. 

INFORMATION 


CONDI 

TION 

Predictors  R- 

-SQUARED 

PG 

f emal es 

reasoning,  vocabulary 

.453 

males 

reasoning,  abstract  orientation 

.556 

bo  th 

reasoning,  vocabulary 

.360 

P 

f  emales 

map  &  abstract  orientation 

.  174 

,ral  os 

map  b  abstract  orientation 

.320 

both 

map  b  abstract  orientation 

.211 

G 

females 

map  b  abstract  orientation 

.254 

ma  1  e  s 

map  &  abstract  orientation 

.  177 

both 

map  orientation,  reasoning 

.  186 

Q 

f  emal es 

ro.vsoninp,  vocabulary 

.206 

ma  1  e  s 

abstract  orientation,  mental  rotation 

.409 

both 

map  orientation,  mental  rotation 

.  177 

!  H I 


Summary  S  Conclusions 


Differences  in  INFORMATION  CONDITIONS  demonstrate  that  this 
variable  represents  different  task  requirements.  Thus,  the 
dependent  variables  selected  to  describe  game  performance  seem  to 
do  so  adequately.  since  they  reflect  different  aspects  of  the 
players'  performance  and  strategy. 

Regression  analyses  indicate  that  cognitive  components 
underlying  game  performance  are  not  the  same  for  males  and 
females  Although  components  underlying  videogame  performance 
differ,  suggesting  gender-related  strategy  differences,  actual 
game  performance  shows  little  variation  attributable  to  gender. 

In  sum,  given  that  videogame  skills  are  well  retained,  fun 
and  relatively  easy  to  acquire,  they  have  much  potential  as 
instructional  tools.  For  example.  games  simulating  navigation 
could  provide  a  simple,  cost-effective  way  of  training  spatial 
learning  strategies  and  exercising  navigational  skills.  Since 
individuals  that  have  different  cognitive  skills  demonstrated 
similar  game  performance,  different  strategies  may  be  employed  to 
achieve  the  same  results.  Future  instructional  paradigms  should 
provide  for  flexibility  in  strategy  development  so  that  learners 
may  make  the  best  of  their  individual  cognitive  strengths. 
Further,  the  dependent  measures  employed  here  reflect  complex, 
strategic  behavior  in  a  simulated  environment.  Measures  such  as 
these  may  have  greater  ecological  validity  than  standard 
psychometric  tests  in  predicting  individual  differences  in 
complex,  real  world  behavior. 
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The  Navy  s  (it'm'Mi  Purest r i cl ed  him*  Community: 

Gn-'er  M  magement  and  Career  1)*» v»» lopmont  Problems 

Gerry  UllcOVO 

Nay  Poisonnel  hese.nch  uui  Development  Center 
San  !'i«v;o,  CA  >2182 

Community  Description  and  background 

In  !‘>M,  (Ii'iicial  I  m  est :  k  t  e<i  Nina  Ol fleers  (i  <*.,  General  I'Khs  i  urn>  given 
community  stitus,  and  a  community  in  mager  was  selected.  Given  its  short 
hi.toiy,  i ;  is  rot  suipiising  that  the  community  is  a  relatively  unknown  entity 
to  people  1(  th  u.tlnii  and  outside  the  Navy.  The  community  is  composed  of  ap¬ 
proximate  1%  i.t'iJi*  officers,  80-percent  of  uliom  are  '.oiikui.  A  disproportionate 
r,  imber  of  os  i  iters  .,ie  1  i  out  eiiant  or  below,  althougn  it  is  expected  that  the 
number  of  otiicer'-  selected  for  executive  officer,  the  second  in  command,  will 
ti  lplo  from  198A  t  o  1087.  The  i  unci  ion  of  this  community  is  to  support  the.  Na¬ 
vy's  lighting  force's  by  serving  m  shore  billots  as  general  managers  and 
specialists  m  anas  such  as  personnel  management ,  financial  administration, 
data  processing  md  computet  technology,  organ  i/.at  lona  1  of  feet  iveue.ss  ,  and  com¬ 
munications.  There,  are  also  limited  numbers  of  General  I’RI.s  in  operations 
systems  technology,  naval  systems  engineering,  pol l L ical -mi  1 itary  strategic 
planning,  and  weapons  engineering.  Women  are  eligible  to  receive  some  training 
on  combat  vessels  md  may  even  become  surface  warfare  officers.  However,  duty 
on  combat  vessels  is  restricted  bv  law  to  tempo r  iry  duly  under  noncombnt  condi- 
t  ions  . 

Career  M  magement  and  Career  Dove lopment  Problems 

I  will  first  discuss  some  of  the  problems  that  have  made  it  difficult  for 
the  Goner.  !  I  Kb  community  to  manage  the  careers  oi  its  officers  and  for  individ¬ 
ual  officers  to  dev  (>lop  their  caioors.  1  will  then  describe  the  policy  and  pro¬ 
cedural  change's  that  occurred  in  1984  that  ameliorated  some  of  the  problems. 
The  pi  obi  ('ms  that  are  discusssed  were  identified  from.  (1)  1982  questionnaire 

lesiil'  >  f  i  om  approximately  4>-pei cent  of  the  community,  (.2)  open-ended  comments 
from  the  same  epiest  i  oiinai  res  (N~'>00),  (3)  80  interviews  w  i  tli  community  members, 

and  (A)  < onv ersat  ions  with  Washington  m.mageis  and  po 1 i cy -make rs .  Tin'  problems 
are  not  listed  m  y  kind  of  rank  ord('r. 

I'lisl,  in  the  ]h)  1  l  (  y-mak  i  ng  area,  the  (ommunity  has  been  handicapped  by  the 
1  h.k  of  freedom  to  control  the  vcai  lv  uumbei  of  .u.i.ess  ions  lnst('ad,  these  mini- 
Ix'cs  hav«'  be  ell  (let  (  rmtned  lor  the  (('mmunity  m  accord  nice  with  a  "surge-tank" 
(  oie.ept  ,  i.e  ,  m  K  (  ord.tiK  (  witii  estimates  of  the  number  of  billets,  or  assign¬ 
ments,  th.it  would  be  vacant  X  uumbei  of  vents  m  the  future  because  of  a  lack  of 
warfare  spe<  iilists  Managing  (  nccrs  bi'i  omes  difficult  when  officers  fromdif- 
f('ie:it  (  oimn  i  ss  i  on  i  ng  years  are  subjected  to  different  selection  ratios  at  the 
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same  points  m  th‘'ir  (..in’or.i.  By  the  same  token,  a  woman  developing  her  career 
faces  minimal  or  intense  competition  depending  on  her  year  of  commissioning 

A  second  problem  for  General  IKI.s  has  been  the  fact  that  they  have  been  re¬ 
assigned  upon  the  completion  of  a  tom  by  officers  from  another  community;  i.e., 
Surface  Warfare  Officers  Thus,  there  has  been  less  freedom  to  groom 

those  of  demonstrated  potential  for  high  i  esp.ons  i  b  1  1  1 1  y  ;  enior  jobs.  Individual 
officers  have  complained  that  SWOs  do  not  know  the  dalles  and  requirements  of 
existing  assignments  or  their  career  potential. 

A  third  , aiver  management  and  development  problem  has  been  the  latk  of  high 
level  acceptance  concerning  the  General  GKl/s  right  to  career  enhancing  billets. 
This  obsticle  his  been  manifest'd  in  tuo  cays:  (1)  the  teixlen«.>  of  the*  Navy  to 
assign  carfare  specialists,  instead  oi  General  GRbs,  to  caroor-enhunc  ing  shore' 
billets,  chon  a  direst  competition  o  ears,  and  (J)  tln>  tendency  to  reserve  e.er- 
tam  billets  for  uirtaie  spxx  ialists,  even  though  Gi'in>ra!  IRLs  m  iv  be  <  apiable  of 
pi'rforming  the  uork.  lb'ie,  j.-areer  mongers  are  stymied  m  tln'ii  attempts  to  de¬ 
velop)  off  ice;  s  for  major  she  te  commands  because  of  icstrie.ted  opportunities  at 
lower  le'Vels.  Gonvc'i  se  1  v  ,  officers  attempitiiig  to  develop!  their  careers  in  piur- 
tic.ular  directions  have-  be'en  forced  to  rc'l  ormu  late  tlu'ir  career  ^  -  'is 
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A  fourth  piroblem  has  been  a  complaint  ibout  the  quslitv  of  the  billets 
available*  for  General  GRI.s.  A  I-F  pierhapis  best  summarizes  the  fee* lings  of  dis- 
sitisfied  General  i’RLs  whe'n  ? '  <*  states. 


‘,S* 


"I  feel  that  General  URL  jobs  are  overall  the  least 
desirable'  within  the  Navy.  These  jobs  are  poorly  defined, 
usually  not  operational  or  competitive,  and  inquire  no 
special  education  or  background.  Many  are  nonessential 
and  have  no  clear  career  path  associated  with  them.  They 
can  be  filled  by  anything  from  a  CWOd  to  a  l.CDR  and  are 
often  gappied  for  long  periods  of  time"  fi.o.,  left  vac. ant). 
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A  fifth  j.roblem  is  that  the  career  path  for  General  I’Rl.s  has  been  an  ambig¬ 
uous  one*  ceii-ied  mainly  in  general  terms;  i.e.,  an  individual  should  be  sure  to 
per  tor  a  well  m  subspxic  ta  1  ty  and  leadership  positions.  It  has  been  difficult 
for  c  ;  r  * '  i 1  ~  « lingers  (i.e  ,  the  community  manager,  senior  officers,  assignment 

managers,  ,md  subspec la  1  ty  managers)  to  offer  advice*  when  the  "wickets"  for  ca¬ 
reer  advancement  wore  largely  unknown,  tin  criteria  for  promotion  in  competition 
with  waif  re  officers  were  un  fornui  1  at  oil ,  and  the  career  advancement  potential  of 
various  -  '.■■■  pieciall  ies  was  a  matter  of  conjecture. 

Th  i.i  cii'  .'or  advice  problem  showed  up)  in  survey  results.  For  exnmpile,  only 
AS -px'rc.c'nt  s  pn.iti'i!  that  they  had  been  counseled  on  the  "tickets"  that  have  to 
be  "pxnu.hed"  for  career  advancement  ,  and  only  dS-piercent  said  they  had  been 
counseled  on  the  "blind  alleys"  that  might  destroy  their  careers. 

The  sixth  piroblem  concerns  senioi  officers.  The  Navy  has  thus  far  concen¬ 
trated  on  defining  the  career  pxnii  for  junior  officers.  Individuals  who  have 
completed  a  (  oimnandei  -command  l  ng  officer  tour  (omp>lnn  about  the  l.iok  of  a  ca¬ 
reer  [cittern  aftei  that  p>oint  and  the  lack  of  op>pv>)  t  un  l  L  i  os  m  the  upper  levels 
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oi  the  Navy  hierarchy  'Hie  lattei  includes  policy-making  positions  in  the 
"fifth  ung  of  the  Pentagon",  senior  admin  i  st  rat  i  vo  jobs  at  headquarter  and 
st  iff  commands  (e.g.,  SIRFPAC,  RECKU1TCUM,  the  Sixth  Fleet),  and  (.ommanding  o f- 
ticor  an1  executive  oificei  billets  at  naval  training  commands  and  administra¬ 
tive  commands.  At  the  present  time,  there  is  no  overall  Pentagon  group  that 
formulates  policy  for  the  General  URL  community  as  there  are  for  the  other  unre¬ 
stricted  line  communities  (SLOs,  aviators,  and  submariners). 

A  seventh  problem  concerns  training  opportumt  ies  and  pipelines,  neither  of 
which  have  been  institutionalized.  Survey  resu'ts  showed  that  only  25-percont 
had  received  training  enronte  to  their  new  assignments.  In  addition,  there  is 
no  department  head  school  for  Gm  oral  URLs,  and  General  URLs  do  not  attend  the 
Prospective  Executive  Officer  or  Prospective  Commanding  Officer  School  unless 
they  have  been  selected  lor  major  command. 

The  eighth  problem  is  the  subspecialty  system  that  is  difficult  to  manage 
and  difficult  to  learn  and  influence  by  the  individual  officer,  ihe  system  it¬ 
self  is  composed  of  many  administrative  t’ements  including  a  placement  division, 
a  division  responsible  for  conducting  a  zero -based  review  of  the  entire  subspe¬ 
cialty  billet  inventory,  manpower  claimants,  spon,->~iS,  designator  advisors,  and 
assignment  managers.  Ninety- two  percent  of  the  survey  sample  viewed  subspecial- 
i  ies  as  important  for  their  careers  Yet,  there  is  confusion  among  of f icers  on 
Lite  administrat ive  steps  they  need  to  take  < ~  obtain  a  subspecialty;  also,  wl  en 
to  obtain  one.  For  example,  b5-percent  indicated  that  it  was  important  to  ob¬ 
tain  a  subspecialty  early  in  their  careers.  And  yet,  a  Navy  policy  makes 
sub.^pec  1  a  1  ty  experience  obsolete  after  five  years. 

The  ninth  problem  centers  on  dual  career  couples.  Approximately  75-percent 
of  the  married  General  URLs  are  married  to  military  men.  These  couples  want  to 
colocate.  This  desire  provides:  (1)  career  management  problems  for  the  Navy 
which  must  be  concerned  wi’L  mission  requirements  and  (2)  career  development 
problems  for  General  URLs  who  may  want  to  advance  in  their  careers  rather  than 
simply  "be  employed."  The  Navy  currently  has  a  policy  that  every  reasonable  at¬ 
tempt  must  be  made  to  colocate  couples. 

The  tenth  problem  is  philosophical;  i.e.,  tlieie  are  disagreements  about  what 
direction  the  evolution  of  career  management  systems  should  take.  For  example, 
some  community  members  want  billets  that  are  reserved  exclusively  for  General 
URLs  or  want  separate  assignment  managers.  In  this  way,  it  is  argued  that  women 
will  be  able  to  compete  more  effectively  with  their  male  counterparts.  On  the 
other  hand,  complaints  are  voiced  that  such  approaches  are  an  example  cf  a  "sep¬ 
arate  but  equal"  status  that  removes  women  from  the  Navy's  mainstream  and 
prevents  them  from  competing  successfully  witli  male  warfare  specialists. 

Policy  Initi  itives  in  the'  Caieor  Area 

In  November  l'*S4,  the  Navy  promulgated  a  series  of  polici  ‘s  designed  to  al- 
1 1 * v  late  sum"  of  the  career  management  and  development  problems  faced  by  the  Gen¬ 
eral  URL  'ommunitv.  One  of  the  changes  was  that  Generil  URLs,  i ather  than  SWOs , 
would  "  i  v  e  is  assignment  managers,  i  e..  General  l  KLs  would  assign  their  own 
communitv  The  exceptions  to  this  policy  are  commanders  who  have'  screened  for 
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command  and  ('APTS.  These  two  sots  of  individuals  u  i  1 1  still  ho  assigned  by 
SWOs.  The  "now  accessions"  desk  will  also  ho  manned  by  a  SWO.  In  brief,  the 
policies  seem  to  contain  elements  of  both  philosophical  positions  mentioned  ear¬ 
lier. 


A  second  change  was  the  institution  of  a  two-career  track,  the 
Leadersh ip/Subspec ia 1 ty  Track,  which  is  the  existing  one,  and  a  new  Specialty 
Track.  The  former  emphasizes  both  leadership  and  subspecialty  oil  lets,  culmi¬ 
nating  in  commanding  officer  positions.  The  new  track  emphasizes  the  opportu¬ 
nity  to  stay  in  a  subspecialty  track,  eventually  becoming  a  program  manager 
rather  than  a  commanding  officer.  However,  a  small  percentage  of  those  in  the 
second  track  will  become  commanding  officers  of  shore  installations  specializ¬ 
ing  in  activities  such  as  computer  operations.  Approximate ly  one-third  of 
General  URL  I.CDRs  will  be  accepted  into  the  Specialty  Track. 

It  is  hoped  that  the  two-career  track  better  defines  the  billets  that  are 
needed  to  advance  in  the  Navy,  that  it  gives  General  URLs  more  options  to  ful¬ 
fill  their  career  goals,  and  tiiat  it  provides  the  Navy  with  the  specialized 
skills  it  needs  to  meet  its  requirements. 


A  third  policy  was  aimed  at  stabilizing  accessions  into  the  community. 
While  the  numbers  established  operate  within  broad  parameters,  there  is  at  least 
some  structure  and  community  control  over  this  important  issue.  The  previous 
community  maniger  characterized  this  policy  as  the  single  most  important  change 
■ithin  the  community. 


A  fourth  policy  was  aimed  at  freeing  up  additional  numbers  of  challenging 
and  career  enhancing  billets  for  the  General  URL  community.  That  is,  1,800 
billets  were  identified  that  were  reserved  for  warfare  specialists,  but  which 
seemed  within  the  capabilities  of  many  General  URLS.  The  goal,  which  was 
reached,  was  to  bo  able  to  reclassify  300  of  these  billets  so  that  General  URLs 
would  be  eligible  for  them. 


Before  reclassification,  General  URLs  were  blocked,  to  a  large  extent,  from 
obtaining  billets  tiiat  represented  the  operational  opportunities  needed  for  ca¬ 
reer  advancement.  The  policy  change  addressed  the  complaint,  previously  quoted, 
regarding  the  poor  quality  of  billets  available  to  General  URLs. 

A  fifth  policy  clarified  the  definition  of  leadership  positions  below  the 
level  of  executive/commanding  officer.  That  is,  criteria  were  established  for 
division  officer  and  department  head  billets,  and  billets  were  appropriately  re¬ 
coded.  This  policy  further  defines  the  carter  path,  thereby  helping  the  indi¬ 
vidual  officer  to  formulate  and  plot  career  strategy. 


A  sixth  nolrcv  further  defined  the  career  , ath  for  officers  just  entering 
the  Navy.  The  problem  addressed  bv  tilt'  policy  was  tiiat  new  accessions  were  ob¬ 
taining  limited  subspecialty  and  leadership  (experience  in  those  situations 
where  thi>  Navy's  needs  were  defined  as  being  preeminent.  The  new  policy  stated 
that,  under  such  circumst  iin.es,  the  'list  assignment  will  be  split-toured  or  be 
a  d-yenr  rather  thin  a  >->eai  obligation 
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A  seventh  policy  was  designed  to  ensure  that  General  URLs,  as  they  compete 
with  warfaie  specialists,  receive  their  fair  share  of  leadership  positions  at 
the  lieutenant-commander  (LCDR)  level.  Although  specific  billets  were  not  re¬ 
served,  the  policy  stated  that  75-percent  of  the  LCDR  executive  officer  and  com¬ 
manding  officer  shore  billets  will  be  reserved  for  the  General  URL  community. 
The  policy  also  dictated  that  the  same  type  of  arrangement  be  implemented  as  the 
community  matures  and  sufficient  numbers  of  commanders  are  available. 

Finally,  the  recommendation  was  made  to  increase  the  fields  in  which  General 
URL  officers  are  assigned  so  that  they  have  the  breadth  of  experience  to  fill  a 
wider  range  of  leadership,  HQ  (i.e.,  headquarter)  and  subspecialty  billets  at 
the  06  (i.e.,  CAPT1  level.  A  lead  and  an  assistant  agency  were  designated  to  de¬ 
termine  hew  best  to  implement  this  recommendation. 

Questionnaires  will  be  mailed  shortly  and  interviews  conducted  to  determine 
the  community's  reactions  to  the  actual  and  recommended  changes  discussed  in 
this  paper. 
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Attitudes,  Preferences  and  Career  Intentions  of  ROTC  and  Non-ROTC  Students 

Melvin  J .  Kirnmel 

US  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences* 

The  Reserve  Officer  Training  Corps  (ROTC),  the  Army's  main  source  of 
officer  personnel,  finds  itself  in  a  dilemma,  for  it  is  being  tasked  to  re¬ 
cruit  and  retain  record  numbers  of  ofticer  candidates  in  a  shrinking  and  more 
competitive  market  (Hertzbach  et  al  1985).  The  White  male  college-bound  pop¬ 
ulation  has  been  its  main  source  of  cadets.  However,  changing  market  cond¬ 
itions  will  necessitate  going  beyond  this  traditional  pool  to  include  women 
and  the  growing  number  of  adolescent  Blacks  and  Hispanics  (McNeil,  1983).  To 
accomplish  its  expanded  mission,  ROTC  must  have  information  on  the  back¬ 
grounds  and  attitudes  of  these  people  in  order  to  develop  effective  recruit¬ 
ing  and  training  programs. 

Over  the  past  fifteen  years,  the  ROTC  Advertising  and  Media  Division  has 
relied  upon  comparative  studies  of  ROTC  cadet  and  noncadet  college  students 
to  develop  its  programs  (e.g.,  Armstrong,  Farrell  and  Card,  1979;  Card  et  al, 
1975;  Hicks  et  al,  1979;  and  Montgomery  et  al,  1974) .  These  research  efforts 
have  generally  drawn  similar  conclusions:  (1)  the  influence  sources  that 
ROTC  cadets  and  noncadets  use  to  make  career  decisions  are  similar;  (2)  the 
ROTC  and  military-related  attitudes  of  noncadets  have  become  more  positive 
since  the  Vietnam  era;  and  (3)  cadets  and  noncadets  differ  markedly  on  val¬ 
ues,  ROTC  and  military-related  attitudes,  preferences,  and  intentions,  al¬ 
though  some  of  these  differences  are  true  for  only  certain  ethnic  and  gender 
subgroups . 

The  present  effort  continues  this  line  of  research  with  a  more  recent 
sample  of  ROTC  and  non-ROTC  students.  Its  focus  is  on  various  socialization 
variables  and  military-related  attitudes  that  may  impact  on  one's  decision  to 
join  ROIC  and  pursue  a  military  career. 

METHOD 

Subjects.  Usable  data  were  gathered  from  898  college  students  from  11 
carouses  with  ROTC  programs.  The  sample  was  composed  of  427  first  and  second 
year  ROTC  cadets  and  471  noncadet  students.  The  frequencies  for  the  ethnic 
and  sex  subgroups  in  ROTC  and  non-ROTC  are  presented  below: 

Whl te  Black  Hispanic 


Male 

Female 

Male 

Female 

Male 

Female 

Cadet 

211 

71 

51 

39 

30 

15 

Non  cade  t 

130 

123 

60 

63 

60 

35 

The  majority  of  the  sample  (54%)  were  enrolled  in  southern  colleges;  29% 
came  from  schools  in  the  northeastern  and  mid-Atlantic  regions  of  the  coun¬ 
try;  11%  were  enrolled  in  midwestem  colleges;  and  6%  were  from  western 
schools.  These  percentages  accurately  reflect  the  geographic  distribution  of 
males  and  females  in  our  sample,  but  they  are  not  as  characteristic  of  the 
cade t-noncade t  breakdown  or  the  geographic  distribution  of  the  different 
ethnic  groups.  The  greatest  discrepancies  are  found  in  the  eastern  and 

*The  views  expressed  in  this  paper  are  those  of  the  author  and  do  not  neces¬ 
sarily  reflect  the  view  of  the  US  Army  Research  Institute  or  the  Department 
of  the  Army. 


southern  college  subsamples.  Sixty-eight  percent  of  the  non-RoTC  partici¬ 
pants  uete  from  southern  schools  as  compared  to  only  39%  of  the  ROTC  sample, 
while  18%  of  the  non-ROTC  sample  and  41%  of  the  cadets  were  enrolled  in 
eastern  colleges.  An  overwhelming  majority  of  the  Hispanics  (86%)  and  Blacks 
(86%)  in  our  sample  came  from  southern  colleges,  while  the  White  sample  was 
r>.<re  equally  divided,  with  34%  from  southern  colleges  and  42%  from  eastern 
schools . 

!r.stru~ent  and  procedure.  University  staff  m.embers  administered  a 
slightly  modified  version  of  the  232-item  "Career  Attitude  Survey"  (Armstrong 
et  al,  1979)  during  regularly  scheduled  class  periods.  The  surveys  were  ad¬ 
ministered  to  ROIC  cadets  in  MSI  and  MSII  classes  (The  ROTC  Basic  Course)  and 
to  non-ROTC  students  enrolled  in  lower  level  required  courses  (e.g.,  English 
101).  The  survey  was  composed  of  items  on  background  characteristics,  media 
preferences,  education  and  career-related  variables,  and  ROTC/Army  knowledge 
and  attitudes.  The  questionnaires  took  approximately  45  minutes  to  complete. 
All  answer  sheets  were  returned  to  a  central  location  for  coding,  keypunch¬ 
ing,  100%  verification,  and  analysis.  A  2  (Cadet-Noncadet)  x  2  (Male-Female) 
x  3  (Vhi te-Black-Hispanic)  factorial  design  was  used  to  analyze  main  effects 
and  interactions.  The  F-statistic  was  used  for  items  associated  with  rating 
scales;  and  tne  x^  statistic  to  analyze  categorical  data. 

RESULTS 


Career  and_Educa tion  Influences .  A  number  of  variables  may  influence 
one's  decision  to  pursue  a  military  career.  Among  these  are  the  military 
attitudes  and  experiences  of  family  and  friends.  When  asked  to  rate  on  5- 
point  scales  how  favorably  their  parents  and  friends  perceived  the  status  of 
an  Army  officer  career,  the  mean  ratings  of  ROTC  cadets  were  significantly 
higher  than  the  non-ROTC  student  ratings  on  both  perceived  parental  attitudes 
(x-3.92  vs  x  =  3.41,  F=35.76,  p  < . 00 1 )  and  the  perceived  attitude  of  t.ieir 
friends  (x  =  3.30  vs  x=3.06,  F  =  1 3 . 9  2 ,  p  <01).  Sex  and  ethnic  differences  were 
found  in  respondents'  perception  of  their  friends’  attitude,  but  not  with 
respect  to  their  parents.  Cadets  and  noncadet  females  perceived  their 
friends'  attitude  as  more  positive  than  did  the  males  (X=3.42  vs  x=3.10, 
F=7.63,  p  < .  0 1 2 ,  and  the  Blacks'  ratings  (x=3.88)were  significantly  higher 
than  khites  (x=3.18)  or  Hispanics  (x=3.23,  F=10.94,  p  <.001). 

khen  asked  whether  or  not  their  parents,  siblings,  and  friends  had  been 


in  ROTC  or  the  military,  a  significantly  higher  percentage  of  cadets  than 
noncadets  reported  friends  with  ROTC  experience  (57%  vs  50%,  x^=5.50,  p  <.05) 
and  parents  with  military  experience  (69%  vs  50%,  x  =19.25,  p  <.01).  Ethnic 


differences  also  were  found  on  ROTC  and  military  experience  variables.  Fewer 
Hispanics  (10%l  than  Whites  (20%)  or  Blacks  (14%)  reported  parents  with  ROTC 
experience  (x^=6.?3,  p  <.05),  while  a  greater  percentage  of  Blacks  reported 
siblings  with  ROTC  experience  (23%  Blacks  vs  13%  each  for  Whites  and  His¬ 
panics,  x^=11.97,  p  <.01).  Blacks  also  had  the  highest  percentages  reporting 
military  experience  for  siblings  (28%  Blacks  vs  19%  Whites  and  15%  Hispanics, 
x^=11.97,  p  <.01 )  and  friends  (78%  vs  69%  and  68%,  respectively,  x^=9.08, 

£  <.05).  Whites  reported  the  highest  percentage  of  parents  with  military 
experience  (71%  Whites  vs  44%  Hispanics  and  38%  Blacks,  x^=72.58  p  <.001). 

The  only  significant  sex  difference  that  held  for  both  cadets  and  noncadets 
was  for  friends  with  military  experience,  where  the  percentage  of  females  was 
higher  tfan  males  (73%  vs  63%,  xfc=3.86,  p  <.05).  A  higher  percentage  of 
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.ts  wt-rf  also  asked  direitly  at. at  tie  sources  that  influenced 
i"  v !  e  t  l.e  r  or  r.ot  to  ;>itKi;aie  in  college  I.OTC.  ROTC  influ¬ 
te  tfist.ied  by  a^ing  ^,ii  tu  ijdi.ti  to  1  rid  I  cate  which  of  14 
es  itfiiuMned  tleir  decision.  The  resulting  percentages  for 
et.ted  in  Table  1.  Cadets  and  nenc^dc-ts  agreed  on  four  of 
;  e:  t  i  ot.ed 
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ufiara  in  lac.e  l.  eacecs  at 

lit,  a"  tly  te:  tiot.ed  ii.flt.ei.cers:  r.it.ily,  friends,  personal  be- 
:  i  liter  gt  a  1  s  .  Media  advertising  at.d  KjIC  unit  requirements  were 
Itast  : :  1 1  ..en  t  ia  1  of  tie  swiics  ior  both  groups.  Ao  ir.ight  be 
a  litter  percentage  of  c.dets  thin  r.ot. cadets  were  influenced  by 
r^ct.is  a;d  other  military  personnel.  The  ror.cadets  more  often 
pers.t.ai  beliefs,  career  gt  a  1  s ,  ar.d  RuiC  obligated  service.  Some 
:.t  eiinic  and  sex  differences  were  fesnd  tliat  cha  ia  c  te  r  i  zed  both 
r  r.aCet  groups.  Whites  chose  career  goals  core  often  than  Blacks 
ei  hi;;  i'  .cs.  clacks  did  not  base  ti.eir  decision  to  join  RGTC  on  personal 
beliefs  as  r „c r.  as  Whites  or  Hispanics,  tut  were  core  influenced  by  ROTC 
revt..:le:s  a:  :  mc-sia  advertising  than  the  other  ethnic  groups.  In  addition, 
a  1  -.  r  r  e  r  ;e:.er.ta6e  of  r.ales  than  ferrates  sectioned  eeonocic  conditions  as  a 
fav.tv,r  in  tleir  decision. 

Ct.her  observed  sex  and  ethnic  differences  in  RGTC  influence  sources  were 
s  .  tr  l  f  i  car.  t  for  cadets,  but  not  noncadets.  Specifically,  a  greater  percent¬ 
age  ct_h.lv.  e-aies  tn an  cales  said  they  were  influenced  by  friends,  (51%  vs 
451,  x  ^  =  1 1  .  8u  ,  p  <.01,»,  teachers  (18%  vs  9%,  x  ^=4 . 1 5  ,  p  <-05),  and  ROTC  in¬ 
structors  vs  25%,  x2=7,94,  p  c.05),  while  mere  RGTC  males  than  females 

were  influenced  by  military  lifespyle  (24%  vs  13%.  x2=8.79,  £  <.01)  and  per¬ 
sonal  beliefs  (34%  vs  25%,  x2=4.15,  p  <.05).  The  only  race  discrepancy 
tetween  cadets  and  r.cr.cadets  occurred  in  percentages  reporting  educational 
gv als  as  mf luencers  (x2=10.66,  p  <.0l).  Within  the  ROIC  group,  a  greater 
percentage  cf  Whites  (3C%;  than  Blacks  (16%)  or  Hispanics  (11%)  indicated 
that  this  influenced  tneir  decision.  Within  the  noncadet  group,  on  the  other 
hand,  the  ethnic  groups  responded  similarly. 

F.'TC  a:  d  Military  Attitudes^  When  asked  how  they  felt  about  serving  in 
tne  military,  cadets  and  noncadets  responded  very  differently  (x^=122.02, 
p  mOO l/.*  The  percentages  of  cadets  and  noncadets  stating  that  they  would 
serve  if  reeded  were  about  the  same  (52%  for  cadets  and  46%  for  noncadets), 
b  -ever,  a  significantly  greater  percentage  of  cadets  than  r.oncadets  stated 
t  r  a  t  they  felt  a  duty  to  serve  (29%  vs  6%)  ,  while  a  much  higher  percentage 
of  r.oncadets  (48%y  than  cadets  (19%)  indicated  that  they  had  not  given  much 
tho:6nt  to  military  service.  In  general,  females  and  Blacks  in  both  groups 
were  less  t<>:  itted  to  military  service.  Forty-nine  percent  of  the  females 
ir.  our  sn-;le  indicated  that  they  had  not  give  much  thought  to  military  serv¬ 
ice;  41%  said  they  would  serve  If  needed;  and  only  10%  believed  it  was  their 
duty  to  serve.  In  contrast,  only  25%  of  the  males  indicated  giving  no  thought 
to  military  service,  as  compared  to  54%  that  would  serve  if  needed  and  21% 
who  believed  it  was  their  duty  to  serve  (x2--59.10,  £  <.001).  With  respect  to 
ethnic  group  differences  (x  -19.19,  £  < .  00 1 )  ,  only  11%  of  the  Blacks  saw 
military  service  as  a  duty  as  compared  to  19%  of  the  Whites  and  18%  oi  the 
iii  si  anics,  while  47%  ci  the  Blacks  as  compared  to  only  30%  of  the  Whites  and 
30%  of  the  Hispanics  said  they  had  not  given  much  thought  to  military  serv¬ 
ice.  fewer  Flacks  than  Whites  or  Hispanics  also  indicated  a  willingness  to 
serve  if  needed  (42%  vs  30%  and  31%,  respectively). 


On 


The  '-ore  positive  attitudes  o£  cadets  was  also  reflected  when  partici- 

p. :.ts  wi-re  asked  to  rate  the  attractiveness  of  ten  aspects  of  an  ROTC  program 
or.  five  p-int  sci.es  (see  Table  2).  Although  cadets  rated  all  ten  aspects 

s  iLr. i  f  ic.»!.  t  ly  hitler  th«n  r.orcadets,  the  rank  orderings  of  the  program  ele- 
;  '  ts  were  relatively  si"ilar.  A"ung  the  prcgr.ir.  elements  rated  most  attrac- 
t:-.e  were  a  cj*  ar.tced  Job  after  college,  the  scholarship  program,  program 

q. -’.  ity,  ar.d  p rvt>:ac.  activities  (e.g.,  course  modules,  social  functions, 
etc.),  obligated  duty  requi  ru-re  its,  ROTC  cadets,  and  program  requirements 
wire  seen  as  the  least  attractive  aspect  of  the  program  by  both  cadets  and 

r. c  ...  Jets.  The  only  exception  to  this  consistent  pattern  was  In  the  ranking 
of  R-7C  instructors,  wlich  was  talked  first  by  cadets,  but  only  fifth  by 

i. . :  ;i  J- ts .  In  general,  Blacks  ard  Hispanics  rated  these  elements  more 
positively  than  did  the  White  subgroups.  This  pattern  was  significant  for 
four  ot  the  ten  program  elements:  Program  image,  program  environment,  ROTC 
cacets,  and  obligated  duty  requirements.  Two  of  the  sex  differences  were 
significant  (activities  and  ROTC  instructors),  with  females  rating  these 
elements  core  positive  than  the  males.  These  ethnic  and  sex  patterns  charac¬ 
terized  both  cadet  and  nor, cadet  groups  in  all  cases  except  for  the  female 
Hispanic's  attitude  toward  ROTC  instructors.  Cadet  female  Hispanics  rated 
treir  instructors  significantly  lower  than  the  other  cadets,  whereas  in  the 
r.^ncadet  group  fe-ale  Hispanics  rated  ROTC  instructors  more  positively. 

Similar  patterns  of  results  emerged  when  respondents  were  asked  to  rate 
ic  aspects  of  Army  life  (see  Table  2).  As  with  ROTC  attitudes,  cadet  ratings 
were  consistently  higher  than  the  ratings  of  noncadets,  but  their  rank  order¬ 
ings  were  similar.  Job  security,  officer  responsibilities,  and  officer  pay 
and  fringe  benefits  were  rated  among  the  most  positive  by  both  groups,  while 
personal  freedom,  prejudice,  and  Army  living  conditions  were  among  the  most 
unattractive  elements  of  Army  life.  The  largest  discrepancies  between  the 
cadet  and  r.oncadet  rankings  were  on  "required  mobility  and  travel,”  which  was 
ranxed  fourth  by  the  noncadets  but  only  eleventh  by  cadets,  and  "required 
discipline",  ranked  seventh  by  cadets  and  twelfth  by  nor.cadets.  Blacks  in 
both  groups  rated  all  aspects  of  Array  life  more  positively  than  the  other 
ethnic  groups,  and  Whites  differed  significantly  from  Hispanics  only  with 
respect  to  "job  security,"  which  they  rated  higher,  and  "personal  freedom," 
wrick  they  ra’ed  lower.  The  only  significant  sex  differences  that  character¬ 
ized  both  cadet  and  nor.cadet  groups  were  in  attractiveness  ratings  of  requir¬ 
ed  travel  and  personal  freedom,  which  were  rated  higher  by  females  than 
males,  and  in  their  feelings  about  Army  tiainlng,  which  was  seen  as  more 
attractive  by  males  than  females.  While  ethnic  and  sex  patterns  were  gener¬ 
ally  the  same  in  cadet  and  noncadet  groups,  there  was  one  exception  for  the 
element,  "officer  responsibility"  (F=4.87,  p  <.05).  Specifically,  whereas 
the  mean  rating  for  ROTC  males  was  higher  than  for  ROTC  females  on  officer 
responsibility  (x=3.9A  vs  x=3.83,  respectively),  the  reverse  was  true  in  the 
noncadet  group,  where  the  male  means  were  lower  (x=3.26  vs  x=3.36). 

DISCUSSION  AND  CONCLUSION 

Results  are  remarkable  similar  to  those  reported  in  the  earlier  research 
efforts  that  compared  cadet  and  noncadet  characteristics.  In  comparison  to 
noncadets,  cadets  as  well  as  their  family  and  friends  continue  to  hold  more 
positive  attitudes  toward  ROTC  and  military  service.  Also  consistent  with 
the  more  recent  efforts  is  our  finding  that  noncadets  are  not  as  resistant  to 
serving  in  the  military  as  they  had  been  during  the  Vietnam  era.  In  fact, 
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tvv:ir  ir,-  -.rs  to  to  even  less  resistance  now  than  at  the  time  of  the  Hicks  et 
al  >  A •  >ti-;.g  et  al  U979)  surveys.  In  addition,  the  present  re- 

v.  -r^i  t  ■:  i'.'.eis  the  earlier  finding  that  cadets  and  noncadets  rely  on 
>.  !:  : :.ce  s...rces  tc  decide  whether  or  not  to  participate  in  an  ROTC 

p :  .  r  i"  ,  l.ll!v_./r.  adets  are  still  more  influenced  by  military  personnel,  and 
i  .  i ^  L.  jvis.ual  teliefs. 

Wllle  vie  .l.-erved  gender  differences  also  are  relatively  similar  to  the 
A:  'is  g  et  al  U4?9>  fit. dings,  the  consistencies  are  less  clear  regarding 
<.  •  ’  to  ■  ;  -  :iv.  .s,  Both  studies  report  similar  ethnic  group  differences 

-  .  •  :  v  ' ,  v.  t  to  '■iliiary  service  Cv  ilt.ent,  personal  feelings  about  ROTC  and 

t  v  ;!:t.*.r>,  ,  lie:  til  attitudes  t^u^rd  military  service,  and  the  military 
;  >::,u  is  of  ,  -.rents  and  friends.  Ho. ever,  the  two  research  efforts  differ 

s  v  --..it  :r  o  t  !  ic  giv-up  results  for  friends'  attitude  toward  military  serv- 

i  s-,  ;  tn  r.Io  ev,erience,  ar.d  the  influence  sources  that  students  use  to 
-.et:,er  or  not  to  participate  in  college  ROTC.  These  discrepancies 
■  =;  1'U.cate  trat  the  Ij^grour.d  experiences  of  college  students  have  changed 
i a  t  1-ast,  with  respect  to  ethnic  groups),  or  they  may  simply  be  the  result 
v :  .-a  plir.g  error.  Further  research  is  needed  to  clarify  this  issue. 

u-jt  are  the  implication  of  these  findings  to  ROTC  recruiting?  First, 
t re  :-.et  list  r.^r.cadets  are  less  resistant  to  entering  military  service  than 
t .  e ;  were  m  the  197C’s  suggests  that  they  may  be  more  amenable  to  ROTC  re- 
e :  . :  tir.g  ua-paigns.  Second,  the  results  suggest  that  media  advertising 
ci  reeled  tc-ard  the  potential  recruit  is  not  very  effective.  A  better  strat- 
right  be  to  direct  advertising  programs  toward  what  were  found  to  be  the 
is, vr  irfluer.ee  sources:  Patents  and  friends.  Finally,  the  large  number  of 
ethnic  ar.d  sex  differences  observed  in  attitudes  and  potential  influencers 
Swt4*est  that  a  multidimensional  approach  to  ROTC  recruiting  is  needed.  Re- 
v.r^itii.g  programs  that  are  geared'only  to  Vhite  males  must  be  modified  and 
e>,  r.ied  if  RCTC  is  to  attract  the  women  and  minorities  that  will  make  up  a 
sw’  nivart  portion  of  the  target  pool  in  the  1990’s. 
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Table  1.  Percentage  of  Respondent*  Indicating  Soyrctf  that  Influenced  Their  Declsloa  To 

Join  or  Not  Jola  ROTC  According  to  ROTC  Mesbershlp,  Ethnic  Background  end  Cender 

ROIC  ROTC 

Infl-er.ce  Sources  .“esbe  rshlp  .Ethnic  Background  Gender 

Cadet  Noncsdet  knit*  Black  Hispanic  halt  resale 
Easily  39  3*  38  34  33  36  36 

Friends  40  39  37  44  41  3°  41 

Personal  Beliefs  32  45  40  27  38  37  36 

Career  Coals  29  38**  36  27  32*  34  32 

EJ.c  Coals  25  28  28  24  27  26  28 

“Hilary  Lifestyle  20  30*"*  26  22  26  24  25 

R.TC  Instructors  32  5***  17  22  20  17  21 

R,:C  Recruiters  19  15  11  34  11*'*  16  18 

£._n  Conditions  18  13  16  11  21  18  11** 

Tea.bers  Counselors  12  12  11  15  10  11  15* 

“.Hitary  Personnel  16  7***  12  9  12  12  10 

Ci.igated  Service  1  11***  8  5  8  7  8 

“edit  Ads  585  11  8**  6  8 

R.TC  .r.it  Req-lresents  3  6  5  3  3  5  4 


*  pc. 05 

*  pc. 01 


Table  2.  Attractiveness  of  RCTC  Prograa  and  Military  Lifestyle  According  to  ROTC 
.“.eabe  rshlp ,  Ethnic  Background  and  Cender 


R  T  C  ?  ?  0C  Ram 


Elesents 

RCTC 

“es 

bershlp 

Ethnic  Background 

c 

tnder 

Cade  ts 

hoocade  ts 

Vhl  te 

Black 

Hispanic 

Ma  le 

Fesale 

G.aranteed  Job 

4.15 

3 

.48 

3 

76 

3.79 

3.80 

3.79 

3.80 

Scho.arsMp  prograa 
lr.s  trwCtors 

4.08 

4.23 

3 

3 

.45*** 

.09*** 

3 

3 

76 

65 

3.77 

3.63 

3.76 

3.62 

3.76 

3.73 

3.73 

3.49** 

JuAil  ty 

Activities 

3.98 

4.02 

3 

3 

.27*** 

.11*** 

3 

3 

60 

52 

3.66. 

3.60 

3.?8 

3-58 

3.65 

3.66 

3.54 

3,36 

Env  1  rennent 

3.84 

3 

.02 

3 

32 

3.61 

3.43** 

3.44 

3.37 

l-agt 

3.67 

J 

.01** 

3 

23 

3.53 

3.40** 

3.33 

3.32 

Re;  vl  reser.  ts 

3.76 

2 

.91*** 

3 

28 

3.44 

3.32 

3.35 

3.26 

RCTC  Cadets 

3.60 

2 

.96*** 

3 

17 

3.46 

3.38** 

3.76 

3.73 

obligated  Amy  Duty 

3.38 

2 

.72*** 

2 

95 

3.26 

3.04** 

3.07 

2.98 

ARMY  LIFESTYLE 

Elener  ts 

Job  Security 
responsibilities 

4.28 

3.91 

3 

3 

.71*** 

.31*** 

3. 

3 

99 

56 

4.17 

3.71 

3.71*** 

3.58 

4.01 

3.64 

3.94 

3.54 

Fa v / ? r Inge  Bene f 1 ts 

3.92 

3 

26*** 

3. 

52 

3.85 

3.40*** 

3.52 

3.67 

C  f  f  leer  Qua  1 1  ty 

3.79 

3 

14*** 

3 

39 

3.67 

3.37** 

3.44 

3.47 

Amy  Coals 

Recrea  tlon 

3.77 

3.64 

3 

3 

.13*** 

.11*** 

3 

3. 

34 

35 

3.71 

3.55 

3.40*** 

3.21* 

3.41 

3.38 

3.49 

3.36 

Required  Travel 

3.50 

3 

15*** 

3. 

20 

3.71 

3.20*** 

3.19 

3.52*** 

Relevance  to  Society 
Discipline 

Dally  Activities 

3.60 

3.63 

3.56 

3 

2 

2 

04*** 

86*** 

91*** 

3. 

3. 

3. 

31 

12 

11 

3.46 

3.49 

3.53 

3.09** 

3.26* 

3.22*** 

3.34 

3.28 

3.26 

3.26 

3.14 

3.16 

Tra  lr.  Ing 

Personal  Relations 

3.60 

3.38 

2 

2 

85*** 

92*** 

3. 

3 

13 

11 

3.41 

3.30 

3.15* 

3.00* 

3.34 

3.14 

3.01*** 

3.14 

Public  Isage 

3.24 

2 

84*** 

2 

91 

3.42 

2.95 

2.98 

3.12 

.ivlr.g  Arrangements 

3.01 

2 

36*** 

2. 

49 

3.06 

2.92 

2.71 

2.63 

?re.'  udlce 

2.67 

2 

55*** 

2. 

59 

2.79 

2.44* 

2.58 

2.61 

Personal  Freedoo 

2.73 

2 

44*** 

2. 

43 

2.90 

2.70*** 

2.52 

2.69* 
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Data  on  ill  ■namtoiunoc  a.  :  a  r.-r:  r-,-  1  mi  i :  r.  r  : : ;  .r 

collected  as  r>art  at  tne  MDC  sv-t---,.  With  :  v  'd  i :  i .  at  i  -ns .  tr.is  -..^t  - 
can  a1  sc  Keep  track  or  personnel  wr  >  per:  -  n  the  maintenance  actions  : r. .: 

'vda  be  used  t  '  develop  Historical  summaries  or  descriptions  >:  die  main¬ 
tenance  actions  that  constitute  i  job.  Ihe  principal  'bjectians  to  usme 
data  trom  MDC  m  tne  past  to  define  jobs  have  been  that  trie  data  are  (  !J  in¬ 
complete  and  (D  inaccurate.  However,  witn  the  advent  of  automated  mama 
tenance  data  collection  systems  suen  as  the  Centralized  Data  Svstem  (CDS) 
on  the  F-16  aircraft  and  the  Automated  Maintenance  System  (.-VMS)  on  c-5A 
aircraft,  these  objections  are  no  longer  valid.  Specifically: 

(1)  Automated  maintenance  data  collection  systems  ensure  that  data  is 
complete  and  accurate.  Work  order  generation  and  tracking  ensures 
that  maintenance  actions  are  recorded:  on-line  editing  produces  accurate 
and  complete  entry  of  maintenance  information:  a  centralized  system 
contains  daily  data  from  all  maintenance  sites; 

(1)  Maintenance  data  consists  of  a  set  of  standard  task  elements  which  refer 
to  equipment  involved  (work  unit  codes)  and  actions  taken  (action  taken 
codes).  In  addition,  the  clock  time  for  specific  maintenance  actions, 
although  influenced  by  tne  general  workload  level,  provides  accurate 
estimates  of  relative  task  times; 

(i;  Maintenance  cask  data  can  be  collected  which  identifies  each  crew  member 
involved  in  a  maintenance  action.  This  data  can  be  obtained  without  im¬ 
posing  an  additional  burden  on  maintenance  personnel. 

<  •* )  Maintenance  data  on  individuals  can  De  aggregated  over  time  to  construct 
reports  summarizing  all  maintenance  actions  performed  by  an  individual. 
Reports  can  be  constructed  as  a  function  of  skill  level,  AMU.  or  base. 

(>)  The  computer  data  base  and  terminals  for  collecting  these  data  are  already 
in  place,  at  least  for  the  F- 1 6  aircraft. 


A  small  portion  of  a  maintenance  job  description  report  that  can  be 
generated  using  maintenance  data  is  shown  in  Figure  1. 


I  AFSC:  32«i» 

I  B*m:  MacDM  AF8.  58th  TTW 

SUM  TOTAL 
TIME  SPENT 

FREQ. 

MEAN 

CLOCK  TIME 

SUM  PERCENT 
TOTAL  TIME 

1  Tim*  Pwrtod:  (5014  to  15064 

2352.23 

1644.00 

1.28 

100.00 

1  WUC  AT 

CDS  DESCRIPTION 

TOTAL 
TIME  SPENT 

FREQ. 

MEAN 

CLOCK  TIME 

PERCENT 
TOTAL  TIME 

74AM  L 

Ad(u*t  Hr*  control  redar  Ml 

42.64 

33.0 

1  29 

1  81 

74AOO  Q 

IntUJtod  Hr*  control  radar  Ml. 

4  M 

2.0 

200 

17 

74A00  n 

R6R  Hr*  control  radar  mL 

6.60 

5.0 

1  32 

28 

74A00  V 

Cl* an  Hr*  control  radar  Ml. 

8.M 

4.0 

2.00 

34 

74AOO  X 

T/l/S  Hr*  control  radar  Ml. 

61.64 

63.0 

98 

2  b? 

74A00  Y 

TrouWMhoot  Hr*  control  radar  Ml. 

181.61 

1M0 

1  62 

7  72 

Figure  I.  F-16  CDS  Job  Description  Report  Excerpt 
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Link  a 


to  i  '.rr.-ru.~-  Job  Descriptions  and  Training  Standards 


Job  descriptive  information  derived  from  maintenance  data  support  the 
definition  of  training  requirements  only  if  this  information  can  be  linked 
to  current  job  descriptions  and  training  standards.  Only  after  such  linkages 
are  created  .an  the  existing  job  descriptions  and  training  standards  be 
evaluated  and  subsequently  revised.  Thus,  in  order  to  realize  the  full 
benefit  or  collecting  maintenance  data  on  individuals,  linkages  must  be 
established  for  all  the  AFSs  on  which  maintenance  data  is  collected.  Figure 
3  provides  an  illustrative  example  of  how  linkages  are  established  among 
maintenance  tasks  (F-!6  CDS),  occupational  survey  tasks  (OS),  and  specialty 
training  standards  (STS). 


Fi  .are  3.  Task  Linkages 


lT.mu  r  v 


s.v-ril  ■.  ipabilities  wi ■  r e  demonstrated  in  this  study  which  have  wide 
npli  ,i'm  i  itv.  Ihe,e  i  apaoi  1  1 1 1  es  include  using  automated  maintenance  data 
ui  .  tien  i  MIC  )  s.  stems  to  t  ■>  i  l  •  •  c  t  task  data  on  maintenance  personnel,  linking 
"i  i  —I  r  •  ■  n  in.  •  •  Mds  t>  .orient  AFS  job  descriptions  ( i .  e .  .  occupational 
ir  .  !’■.  p>rts)  and  specialtv  training  standards,  and  developing  reports  which 
ui  '  e  used  to  .  1'iMrikt  tiaining  requirements  tailored  to  specific  groups 
t.e..  bases  J  ind  t 1  ’  identity  tiammg  de  1  i  c  i  enc  1  es  . 
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I  R  A I  \  I  \  G  FMRIIASIS:  I  Ml:  HI, S'!  TASK  FACTOR  AVAIFABIT. 


lieutenant  D.iviii  i.  Hardy,  Lieutenant  Colonel  Charles  I).  Gorman, 

and  l)r.  Unite!  I..  Dnskill 
l  SAI  Ocrupationa!  Measurement  Center 


I  'D  K  PCS  I 

1  he  papci  illustiates  three  important  lacets  of  the  training  emphasis 
(  II  )  lac  tot  .  Inst,  II.  i.itings  provule  the  best  data  lor  establishing  train¬ 
ing  pt  unities  in  comparison  to  the  other  task  tactors  frequently  cited  as 
needed  iui  cui  [  K  ulum  decisions.  Second,  the  reason  training  emphasis  is  the 
be.-A  task  Lu  tut  lot  establishing  training  priorities  is  the  significant  correla¬ 
tions  that  o\i'-’  between  the  'IP.  ratings  and  other  task  factors.  Third,  TF. 
data  aie  suite  tent  lor  instructional  systems  development  (1SD)  decisions. 

I  allowing  a  brie!  historical  summary  of  the  training  emphasis  factor  and  a 
discussion  <>|  the  just  mentioned  facets  ol  TF  data,  a  brief  description  of  how 

II  technology  is  applied  by  tne  DSAI  Occupational  Measurement  Center 
(CMC)  will  be  piovided. 


HISTORY  AND  BACKGROUND 

I  lie  Darning  emphasis  iask  factoi  developed  from  a  long,  detailed,  and 
complex  leseaich  program,  christal  (1970)  first  proposed  the  gathering  of 
task  lac  toi  information  using  subject-matter  experts  as  the  pool  ol  raters  from 
which  samples  could  be  drawn.  Mial  and  Christal  (1974)  and  Mead  (1975) 
conducted  the  initial  research  using  the  policy-capturing  approach  to  suc¬ 
cessfully  predict  training  priorities.  In  addition,  Mead's  research  was 
paituularly  hopeful  in  suggesting  an  intimate  integration  of  1SD  practices, 
i.c  upntionol  survey  data,  and  curriculum  design.  In  1977,  Stacy,  Thompson, 
and  I  hom.son  repotted  that  standard  occupational  survey  techniques  were 
reliable  lor  the  collection  of  task  training  factors. 

I  he  lust  major  leseaich  specifically  on  training  emphasis  was  reported 
by  Ruck,  !  homjKSon,  and  1  homson  (1978).  This  report  established  the 
ground  weak  toi  use  and  analysis  of  IT,  data.  Some  of  their  recommendations 
ware.  th.it  II.  d  >’a  should  be  collected  and  not  predicted;  that  training 
emphasis  lutings  be  collected  while  the  routine  collection  ol  task  delay  toler¬ 
ance  and  consequences  ot  inucJquate  performance  task  factors  :  hould  be  dis¬ 
continued,  and  that  ratings  be  separately  collected  for  each  Air  Force 
-.pec  laity 

the  IF  task  factor  research  was  released  lor  operational  use  to  the 
l  SAI  < 'c  <  upationnl  Analysis  Program,  USAFOMG,  Randolph  AFB,  Texas,  in 
l'F/'i.  Di  nkill  and  Mitchell  (1980)  mentioned  the  exstensive  gathering  of  TF 
data  toi  use  by  technical  tiuimng  curriculum  developers  and  training  man¬ 
agers  In  1981,  Staley  and  Vveissmullet  presented  a  paper  on  interrater 
((  liability  in  the  CODAF  programs.  I  hey  suggested  training  emphasis  is  a 
stable  ta~>k  (actor  in  nnncompR'X  specialties,  but  that  more-  research  needs  to 
be  done  i,n  the  application  ol  TF  data  to  complex  specialties.  Jansen  (1982 
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and  further  m  1985)  tackled  the  issue  of  complex  specialties  and  proposed  a 
common  rating  policy  (CRP)  approach  which  is  adequate  for  all  but  a  few 
spec’alties . 


IT.  vs  OTHER  TASK  FACTORS 

Before  suggesting  an  integration  of  the  TE  task  factor  into  a  prominent 
pusition  m  1  SI)  theory  and  practice,  a  discussion  of  the  correlations  between 
IE  and  a  number  of  other  task  factors  investigated  by  Goldman  (1985),  and 
Ruck,  Thompson,  and  Stacy  (in  preparation)  is  necessary. 

Goldman,  in  a  aifterent  approach,  both  statistically  and  methodologically, 
showed  the  same  conclusions  alluded  to  by  Ruck,  et  al  (1978),  that  the  TE 
task  factor  is  the  best  task  factor  available  and  that  other  "training"  factors 
are  highly  correlated  with  TE.  The  factors  used  in  Goldman's  research  are 
identified  in  the  Instructional  Systems  Development  (1SD)  Eight  Factor  Mod. 

In  addition  to  the  TE  tactor,  those  factors  included: 

1 .  Percent  of  members  performing 

2.  Average  percent  of  time  spent  by  members  performing 

3.  Task  L earning  Difficulty  (LD) 

•1.  Consequences  of  Inadequate  Performance  (COIF) 

5.  Task  Delay  'Tolerance  (TDT) 

G.  Probability  of  Deficient  Performance  (PDF) 

7.  Immediacy  ot  Performance  (IP) 

8.  Relative  Frequency  (RF) 

The  conclusions  reached  by  Goldman  suggest  that  instead  of  nine  sepa¬ 
rate  factors,  there  is  only  one  clearly  defined  training  factor;  and,  in  terms 
ot  predicting  critical  vs  noncritical  tasks,  rather  than  nine  fauors,  there  is 
teally  only  TF.  Also,  by  collecting  a  minimum  number  of  task  factors,  effi¬ 
ciency  in  data  gathering  and  analysis  ts  significantly  enhanced.  These  con¬ 
clusions  are  a  m-sult  of  the  high  correlations  found  between  the  TE  task 
factor  and  the  other  factors  as  shown  in  Table  1. 

Rematch  conducted  b\  lUnk,  Thompson,  and  Stacy  (in  preparation), 
lias  (Inert  impact  on  the  use  ol  task  lac  tors  m  the  ISI)  program.  Jansen 
(1985'  states,  "the  utility,  i  eliability ,  and  validity  of  training  emphasis 
ratings  in  terms  of  ISP  theory  have  been  demonstrated",  by  the  aforemen¬ 
tioned  authors.  I  he  thrust  of  then  t  eseari  h  was  the  development  of  a  task 
training  emphasis  scale  and  some  t taming  piionty  equations.  The  following 
task  lac  tots  uete  list'd  in  their  study: 

i  1  mining  Emphasis 

J  Probable  consequences  of  inadequate  performance 

t  lask  delay  tolei  anc  e 

I .  I  earning  (fit  f  ic  ulty 

Pc  rc  ml  members  pei  foi  mine) 
n  I  arc  cut  t  line  spt  nt 
/  Disk  grade-level  Hide 
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The  results  of  Ruck,  Thompson,  and  Stacy's  analysis  shows  a  very 
positive  correlation  between  T  b  ratings  and  several  other  task  factors  (see 
I  idle  J )  I  he  conclusions  .cached  by  Ruck,  et  «1,  in  relation  to  other  task 
luctors  uned  are.  that  11  ratings  are  "construct  valid”  because  they  can  be 
predicted  by  using  IS!'  burning  Km  tors;  lb  ratings  are  reliable  since  super¬ 
visors  give  their  ratings  independently  and  have  high  agreement  with  one 
another;  and,  tin-  process  used  m  their  study  for  arranging  task  lists  in 
order'  wf  training  priori  tv  should  be  adopted  by  the  training  community  to 
improve  training  further,  they  t  eeommend  that  TL  ratings  be  routinely 
collected  unci  "that  the  consequences  of  inadequate  performance  and  task 
delay  Kile  ran ,  e  babas  be  collected  lor  those  specialties  lor'  which  they  are  of 
s|  ec’.al  mtere.-t,  since  recommended  lb  rating'  indude  consideration  of  these 
lac  tut  s . " 


i  i.  AM)  I  SI) 

1  he  I  SI)  program  lor  the  Air  leave  is  operationally  defined  in  AFP  50-58 
«  i n?; -, ) ,  then  outlined  in  AIM  5<>-J  (l')?1).  1  rom  all  ISO  sources,  there  are 
..veil  stated  task  factcas;  th.ey  arer 

I .  Pel  c  eiii  mc-mbc  i  s  performing 

J.  \ umbel  el  members  performing 

h  Consequences  ol  inadequate  performance 
1.  I  ask  delay  tolerance 
i ask  !‘M!'nmg  drib!  ulty 
n.  frequency  cd  performance 
i.  I  raininq  development  time 

I  i  om  previous  discussion  m  this  paper,  it  has  been  established  that 
it  Mining  emphasis  is  th<  be  d  task  bu  tor  available,  and  other  factors  are, 
pet  haps,  r edundant --mainly  because  II  ratings  inc  lude  consideration  of  most 
ol  the  other  task  tactor  var  iables.  I  he  key  point  of  Ruck,  et  abs,  report  is 
it  bammg  emphasis  should  be  the  most  extensively  used  task  factor  when 
u  mg  the  IS!)  approach  to  make  training  decisions.  Indeed,  their  research 
provided  ivetal  >  d ) .M ’  compute  i  programs  that  allow  for  presentation  of  TT. 
data  III  Very  effective,  usable  bum  ds 


■  IKK  \l  Al’I’l  h  ’A  I  R  >\  crl  II  I  I  riiMM.OtJY  BY  USAIOMC 

.'inc  e  the  re  lease  id  II  technology  lor  operational  use,  USAFOMC  rou- 
. .  n  <■  I  \  ha.  collected  11  ratings  is  a  stand. u  d  part  cd  conducting  occupational 
malysi  .  Allei  the  rating.  hive  been  entered  into  the1  computer,  the  CODAP 
program  f< I  X A 1  I  is  used  to  identdy  and  eliminate  deviant  raters.  This 
ptoces,  |.  icpeated  until  an  acceptable  inteiratci  reliability  lot  a  single  I'cdor 
is  at  ica'.:  J1',  and  the  inter  taior  reliability  tea  all  raters  is  .90  or  more. 
Alter  ehifving  th<~.  mm, mum  ievds  ol  acceptability,  any  remaining  deviant 
later  u  c  individually  reviewed  lot  retention  or  rejection,  based  on  the 
coir  elation  id  tfaii  i, dings  with  mean  ratings  fur'  the-  total  group.  At  this 
point,  r  iter  mg  have  been  dimm  tied  an  examined  to  determine  if  there  are 
nr  vti-ma!|.  .. cellar  o  n  .  m.c.  ,  i  them  Mmiliritic  ,  may  sucjgest  the  presence 


it  mu!'. pi.  i n  the  >.  :iui  iaddei  Also,  :!  m  »i c  than  H’.  percent  o)  the 

r 1 1  '  it1  the  .a  :gmal  simple  tie  riim.n.iictl,  i  separate  K1  XA1.I,  should  be 
run.  c\  m  ‘huugh  then  u »-  no  apparent  svstenatr  similar  .ties ,  to  dot  ermine 
it  or.  tiller  r  at  mg  poke.  \  exist. 


-  'i»  o'  i.  v.  t  pt  it,i(  Hi  \Ai  1  ha-  been  .  • ' '  h  ie\  ed ,  the  pron-ss  ot  identify- 

1  r  <  i-vim'  '  s-- k-  bnj;i>  U  u..  ’hr,  the  me. in  ot  the  to.sk  stondud  devia- 

ti  o'.  A\'i  s.  uM  t  tie  :.t,i:M.ii  j  '  je\  lo '  r  til  A  the  standard  deviations  (SDSl)i 
;•  ’  *  i  f  ■  i  u  .  i  t.';  !.  ■  ii.  o  Id.  d  tope  T  t  .,-f  i  is  ks  'a  hose  standard  deviations 

<\  (id  t  h. !  -  \  due  Oil  defined  is  d<  \  i  mt  tosks  1  hese  identified  de’Vldllt 

'  i  k'  n.  t.h.n  •  x. mimed  U  i  -m-.t.m.iu  -.lmiio:  itn  -- .  It  the  deviant  tosks  ore 

I  nd.m,  then  tin  d  it  *  i..n  it  urn-d  .\dh  ...nlititive  It  tin  tleviont  tosks  oi  e 

.  .*  .’loti.  ill’,  misted,  tlnn  th.  dito  ot  t  ex  mimed  toi  multiple  policies  or'  the 

■  Jit  i  ’..id  h.  U'.<  d  is  tin.  it  •  with  o  t.iutiun  to  the  usei  thii  within  certain 

I I  •  O'  tin  '  oti  i  dis  i.jr .  eii  i  ti  the  an\  unt  emphosis  t hot  sould  be  [dared 
i>i  'i.epiin; 
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"'nip.iiiM.n  showed  Ii  ratings  to  be  supetior  to  the  other  task  factors 
tw.au w  "t  its  predicts  e  value  and  consistent  incorporation  of  other  task 
t-  tut  ^  within  i's  iat unis  \ext,  a  short  discussion  pointed  out  the  rmpor- 
'  itv  •  of  training  emphasis  ratings  in  the  1  SI)  training  development  process. 

1  ir.alK ,  a  tn  let  d"s.  ript ion  was  given  <>n  how  the  l' S  ATONIC  is  currently 
![■[*!  \  i  n  <  j  II  tec  hnolouy  1  he  purpose  of  this  paper  is  to  urge  the  incorpora¬ 
tion  and  integration  <d  'he  ti  lining  emphasis  rating  as  the-  single  most  useful 
’ask  fa*  toi  available  to  the  '.taming  community,  and  to  those  who  are  using 
'he  Instructional  Xy  denis  Ijev elopmeiit  approach  lor  making  training  decisions. 
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•>  \r"c‘  * :  ‘ ,  .ro/'ie  _,-'it  ,■'1:0  w'to  soldier  who  are  qualified  and 

,•  c'  v’r.  ’'i  fir  is-?  3 1  l  ho  D^AG'N  tntiar,,jr  wea  '  n  system  current^  /  entails 
■>..  .g-  ,jf  vjii'-i ;  s'"  c  un  1 t '  i<n ,  -'an  :>>> ,  and  troop  s jpoort  facilities.  The 

V' 1 '  - 1 1  ^d  ivailabilitv  of  th<>se  '■eso.'.es  are  constraints  on  the 
.•  * f- .  f  1 .  eoess  of  fe  1  •'  s  t 1  f  utior  al  ;  00  jr^"  to  produce  such  soldiers. 

r  -  jrr-,vii‘  trai',inn  i«-'..es  'rr't  the  provision  of  a  broad  ranqe  of 

target  en  i  <  ,e  "(■•'t  '.Mw.ii  fens. 

■>  ’  ,  ...  r-  .  regents  the  resi’ts  of  an  evaluation  of  the  cost  and  training 

, ;.->n°ss  of  throe  alternative  DRAGON  training  programs.  The  programs 
1  den  t ’  :  1 1  ’n  strjfajre,  differing  on!  /  in  the  training  device(s) 
avi  ta  subcart  the  program.  Three  training  devices  were  involved;  the 
1  i^nch  effe..ts  tr3intr  'LET),  the  ’qjnch  en  vi  ron-'ent  simulator  (LES),  and  the 
j’-'i’itej  tan»'  3nfiarnor  gunner/  s/stem  -  ,)uAG0N  (sFAGS-D). 

"ne  LrT  and  LES  were  already  beini  used  for  DRAGON  training;  the  STAGS-D  was 
ir;  .level  a>  ent.  These  devices  were  conbined  in  the  following  mixes  to 
irudut  -  the  alternative  training  program: 

n  LET  and  LES  (base  case; 

r  STAG‘>-0  alone 

0  s'AGS-D  and  LES 

i'STtM  LESS0 !  1>T I ONS 

i-AS)N  a  -ediur  ran  ie  wire-guided  antitam  weapon.  It  is  fired  by  one  man 
r r ; .  ;  the  right  shoulder  with  a  ra i pod  supporting  the  front  end  of  the 
1  a unu her .  The  two  -ig  lor  components  of  the  DRAGON  are  the  round  and  the 
1  n  [pf,  The  round  consists  of  a  missile  prepacked  in  its  launcher.  The 
trailer  is  mated  to  the  round  before  firing.  After  firing,  the  empty 
’nun  her  is  discarded  or  destroyed  and  the  tracker  is  used  on  the  next  round, 
[ne  VAG  e;  r.  an  be  fired  it  night  h/  replacing  the  day  tracker  with  the 
*■  h- r  1 )  1  n'ght-  tracer  (AN/TAS-5). 

To  en  ■  \  p*  ,  1 1  r  j-f  with  D- AGIN ,  the  gunner  places  the  crosshairs  in  the 
fra.  » e r  inf  in  th->  tirq»t  and  depresses  iaoth  a  thumb  safety  Switch  and  a 
,M1»  fmng  swi*  h.  Thero  is  a  0.6  second  delay  before  the  r  aket  motor 
c)r,.\  r.  ,  (>’(*' t  rh-  m’>s  le  fru,  1  the  lamcher.  The  gunner  con;  nues  to  track 
the  1 1  r  j»»  f  i  n  *■  1 1  '"ls.ile  impart.  During  flight  an  infrared  (IR)  flare  at  the 
■  n )  )(  the  r  1  s  s  1 1  r-  allows  the  tr.u  ker  to  determine  the  relative  position  of 
;  n-  » -  >  g  r  I f,;  the  ,ii  p.)in».  The  tnacipr  sends  fomiiands  through  the  wire 
,  t  a .  r  .  TO  .tiro  jf  thr„ster  mo  tars  >m  the  missile  to  correct  deviations  from 
tv  ai'  ..('inf.  When  i.orret  tions  are  needed,  a  single  pair  of  thrusters  will 
fir-.  Th»->.*  thrusters  <  annot  be  fired  again. 
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■  '  ‘  '  .  '  s  ’  •  >  *  •  •  -  \ »*r  ?  -i  0*  ,'t or —  ance 

*,  ■■••.*  :- •  !  *i*/  y-frarej  scarce.  A  standard 

'■  ■’  •  '  >-  :f  -  '•  •  t  t-  tre  SySte'O. 

1  .■  •  '  ‘  *-  tvv  .  ..  i'’.  y  -dure,  an  1  to  provije 

•  ■ ..  ■  .  ,1*-  .... ...  *  ....  .yects  aOjI  i 

•■>■  1 '  *vv;  i  '1.-  l-Aj  ,vi-'  . 1 .  .tm ;  a  LET  engagement  the 

■•  "'v1-  *a ;■  :  /.ait  tnrnjqn  the 

■  '  a.  i-  ’f  ty-  li.e  -  v  si' .  To--  1 1  y  ,ner  i$se~bl/  then  fires 

*  '•  y-yal-r  .  ar;rit--  at  t->..  ’ojr  ;f  tne  launch  tube. 

•  •  '  1 1  ■  ’  i  c .-;  _  1  ’  „ -  j  1  a  vs  t r  s'.  a;  tv.  at^s  a  we  i nlr  t  shift 

1  t’v  wevyt  T'ss  '  a  -i^ile  Tea.inq  the  launcher.  The 

1  '•  i  *  /  ‘n-  . £"r  is  r  t  i5  V  j:  y  tnat  'f  an  actual  firing,  neither 

■'  "•  at  r  ■)..,!  ati'o  trr  '~a.p  s'  tetris.  The  gunner  tracks 

;  -  :t  '' s  >:  >f  v .a  *-»'«.  Trip  LET  simulates 

:•  -r  -•  t  >  1"  -  ”  *ters  in  I c*er  vcrc  vnts'  b  /  requiring 

t  tr  r  •  '■  r  ii  ff  ■*  r  ;<r:  •  >  lapsed  f  ime-s  oefore  the  device  terminates 
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■7  tne  monitoring  set  (same  one 
and  LE 3  is  that  the  LES 
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M’1, jilted  by  the  downward 
'■<  ir  of  the  launcher.  Because 
*ril  of  the  Army  has  directed 
v.p  times  per  day  or  more  than 
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be-  1  !-■  •<  *  ,  :  -fit  s  lor  t  >'*  utuhv  <  •  •••.vd-  1  of  soldier  s  who 

'  i>  '  1  ri'*  -1  v  '.f  A' j  ju' ni>r  luri'i'1  .  n--  station  unit,  training 

'  1  ■  '  *  «  '  11,  •  *  root  ri"in  1,  LA.  0. 1-  -» >. !  y  thirty  soldiers  were 


•  ‘  i . .  i'  *r-}'  ?  through  15  November 

•••  •  s  :  :b'.  ~'-e  s  Iher-,  *-r~  r;.njo~ly  assigned  to  one  of 

.1-1  ..-.’.v.  i'°.  E  isnty-seven  percent  of  the 

-.  ti.-‘  A>--  .  -j'  ,is*ves  *it"  tn-_  remainder  being  in  the  National 
•’  *’  >■  -  .-•  ■  e t a-:-- t!i-.s~  types  of  sol'iie^s  in  the 

■  ■  i'  s.  ■  .  i  ’■  $v  tnv  ten  *ee\S  and  h  ad 

t  ’•  '  ■  i . 


•  i  ~'’s  if  the  Infantry  Training 
s.l  tiers  re  jualified  DRAGON 

'-Aj'N  ;  /'-ier  course.  They  nad 
ir.  tn-  'Deration  if  tne  STAGS-D  before 


'  .  N  .  ,  o  .  .!  a‘  "  rt  T-.''r'i".  .-< ; s  tne  base  case.  Tne 

.  >-  . s  i c  t » •  ■ :  .  r  ;  at  ion  of  lecture  and  hands- 

,  *  ’  .  i  *_..  .  jnr'- titles  ise  :  t;  determine  tne 

••t..  .*  ■  -v'tn  ‘ne  '  E"  and  LES  are  used  for 

a- ill*  if  :"i  n  t  b  i  **  s  re  a  ,u-e  t^e  soldier  to  engage  a 
■ '  .  '  .  ■  1  r  , ar  i , ,  i. jofitiois  using  the  Lc.  T  and  tne  LES. 

•  *  •  a t  i  t’- a  i i ng  :  r  zv  i '  s  .-.'••-rt;  structured  in  the  same 
•  ■.  _  r .  ■  r  a  .oor.prnte  substitution  of  training 

’•  >'<•!,-  '  ir..-;  v;th  tne  LE1  and  LES  witn  the  STAGS-D,  while 

v  LE '  .ini  .Sul  the  ETAGS-D  in  Dlace  of  the  LET.  Tne 

; '  ..E  P-;.  1 ,  me  STAGS  pure  P01,  and  the 

jt  :  :  it  a  uor'  et '  it",  si.  Idler  ot  if  icioovy  with  an  actual  DRAGON 
Aim  .  i  m  emulator  mod  dinuj  training  and  soldier  perceptions 
t  "  ■*  *  i  ,  tr  J 1  n  l  ng  pt  "  if  1  ''S  . 

•  ■  /  t  to.,  s  1 . ‘ i <;r u  ,vit  i  DRAGON  was  assessed  in  live  firing 

t  m.  -_n  j  nf  each  wek.  The  live  firings  were  conducted  on  South 
ng.-  i;  --.rt  ,rnnim  usim  inert  missiles.  The  target  was  a  manned 
tom  .'MET7'  witn  a  Irivr  and  tan*  commander  aboard.  The 
.  lie!  I’m-ma*-!  /  rmnt  t>  left  and  left  to  right  at  5  miles  per 

i  D  i  /  ST  mm-. .  >  fr-js  the  firing  point. 

mol!-1  riru'i  won  recorded  on  video  tope  by  TRASANA  personnel 
unn.-r  ti  tin;  sensor  fGAS,1  sv  >tem.  The  GAS  consists  of  a  six-power 
n ;  i  »  -•  1  !  •  'i  s ,  a  s' —  t ,  1 1 1  fiber  optic  bundle,  a  color  video  camera, 

•  '-".ir  ,  ml  i  .irmitor.  The  objective  lens,  which  produces  the 

i  i  *  i  -  ",  *h<-  ,iRAG  iN  t  r  u  r  k  *  •  r  .  is  at  tar  tied  to  a  base  plate  bonded 

.  In-  1 1 1  >  -  •  r  optic  bundle  tarries  the  image  from  the  lens 

.  I'M  am  •  of  li-nii  rori  fibers  to  the  camera  enclosed  in  a 

•  :  •  .  The  eimera  transmit-)  the  image  to  a  recorder  located 

'  Tn--  i- 1  ige  i"  stored  on  tap--  and  simultaneously  displayed 

r  *  r  t  Hiv  viewing  of  the  en-ia  K-ii"' nt .  The  engagement  can  be 

i,Mst  in  d-'torMining  hit  •»  mss  and  fp'  distinguish  between  a 
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'-'cni'ii'.'”',  locate  on  the  bracket,  that  attaches  toe  1-'ns  to  the  base  plate  on 
the  trvker.  The  alignment  of  the  GAS  and  tracer  reticles  was  checked 
befort  ao,i  after  ^atn  shot  and  prove-1  int-  stable  as  only  small  corrections 
w^re  necessary  after  several  firings. 

"etl  pr  .-s  *nc*‘  of  tog  reticle  '>■'  the  ji  Vc  '  <pe  <  a  I’ve  firing  permitted  a 
ietailej  analysis  "f  gunrier  per  form  n,cc .  The  hat  a  Sciences  Division, 

NaMonal  R.anqe  Operations  Directorate  at  *lhite  Sands  Missile  Range,  New 
ico,  analys'd  the  tapes  using  manjal  and  automated  techniques  to  provide 
the  -a-.nitjle  f  gunner  annua  error  at  various  points  during  the  engagement, 
the  tv'i-s  at  which  certain  critical  events  occurred,  and  other  data 
ges,  nbinu  the  eng  segment. 

Data  relating  to  soldier  proficiency  with  the  simulators  were  collected 
during  the  two  qualification  firing  tah1^  conducted  at  the  end  of  each 
<  I  a'"‘ .  Th>  se  data  include  the  number  hits  recordei  on  each  of  the 
qualification  tables,  the  number  of  hits  with  each  simulator,  and  the 
:  .  ■  1 1  ‘  i  ti  o  ',t'-tu»  "f  o  .r  t  \jldi*r  at  a  r.sjlt  '>f  his  performance  on  the 

tables. 

The  pehcenMons  of  the  soldiers  concerning  the  simulators  and  training 
programs  were  obtained  tnrouqh  surveys  and  written  comments  after  the 
soldiers  had  fired  the  live  DRAGON. 

RESULTS 

The  video  tapes  r/  the  live  firings  were  analyzed  to  answer  four  questions 
about  each  soldier's  performance  during  his  DRAGON  engagement.  First,  did 
the  soldier  achieve  a  successful  launch  by  regaining  control  of  the  missile 
after  undergoing  the  launch  effects  (heat,  noise,  shock,  etc.)?  Secondly, 
did  the  soldier  keep  the  missile  in  the  air  long  enough  to  reach  the  target; 
in  this  case,  about  eight  seconds?  Thirdly,  did  the  soldier  fly  the  missile 
tnrouqh  a  rectanqle  contaminq,  but  slightly  larger  than,  the  target;  that 
is,  dig  he  come  "close"  to  a  hit?  Finally,  did  the  soldier  hit  the  target? 

A  comparison  of  the  results  for  the  soldiers  in  each  of  the  three  training 
programs  showed  no  statistical  difference  in  performance  among  the  groups. 

In  fact,  the  largest  difference  between  any  two  groups  in  any  area  was  six 
percentage  points.  This  analysis  did  reveal  several  points  of  interest 
concerning  the  engagements.  Nearly  half  of  all  the  target  misses  were  caused 
by  unsuccessful  launches.  Almost  all  soldiers  who  achieved  a  successful 
launch  flew  the  missile  "close"  to  the  tarqet.  Many  of  the  missiles  that 
came  close  to  the  target  but  missed  did  so  because  of  excessive  tracker 
movement  by  the  gunner  during  the  last,  few  seconds  of  missile  flight.  These 
observations  provide  the  developers  and  providers  of  DRAGON  gunner  training 
with  specific  ar>'*as  in  whuh  increased  emphasis  car  produce  siqni  f  icantly 
improved  performance. 

The  video  tanes  also  allowed  examination  of  the  maximum  deflection  clue  to 
launch  effects  of  the  tracker  reticle  from  the  ideal  aim  point  on  the  target. 


Measurements  wen’  taken  in  both  the  horizontal  anil  vertical  planes.  Again, 
no  differences  were  observed  between  the  groups.  It  was  noted,  however,  that 
those  soldiers  who  hit  the  target  tended  to  have  allowed  a  smaller  net  action 
after  launch  in  the  horizontal  plane  than  those  soldiers  who  missed.  This 
t .’uden<  y  'H  'mt  h,mu  ir,  c *,< ■  v<>rtn.al  plane.  l'RA60N  gunner  training  has  long 
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e^onas  i  z-M  crntr  'l  ■•'f  the  tracker  in  the  vertical  plane  to  reduce  the 
possibility  of  groioding  the  missile  luring  or  just  after  launch.  The 
results  afnve  suggest  horizontal  control  is  at  least  as  important  in 

ilti^atel/  achieving  a  target  hit. 


These  analysis  resets  show  that  no  differences  <OjM  he  fount  retween  the 
trainmi  effectiveness  of  the  three  training  programs,  Similarly,  the  twenty 
veir  life  <  vole  costs  of  tne  programs  were  fount  t  <  be  nearly  the  same  with 
tne  largest  cist  difference  being  ten  percent. 

Thr-  res.lt  >  0  to-'  g  j  a  1  if  ication  exercises  using  the  .arious  si“j1  at'rs  were 
compared  ti  t^e  live-fire  results  to  determine  if  my  relationships  could  be 
f-'unt  bet.ve.rn  soldier  performance  with  the  devices  and  performance  with  the 
DRAGON  m  terms  of  target  hit  or  miss.  The  twentv  qualification  shots  are 
li/i  V:  mto  :x'i  tables  jailed  Table  HI  anj  Tmle  IV  of  ten  shots  each. 

Th.  orber  'if  sh ‘>1 '  with  eacn  simulator  is  shown  below  for  eacn  training 
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The  qualification  standard  used  in  previous  DRAGON  gunner  courses  was  at 
least  eight  hits  on  each  firing  table.  When  this  standard  was  applied  to  the 
results  of  the  qualification  firings  of  the  soldiers  involved  in  the  study, 
thirty-one  (62%)  of  the  soldiers  in  the  LET/iES  program  were  deemed  qualified 
wnile  no  one  in  the  STAGS  pure  program  and  only  two  (4%)  of  the  STAGS/l.ES 
soldiers  were  qualified.  The  qualification  standard  was  determined  to  be 
inappropriate  for  the  latter  two  programs,  since  all  the  programs  had  been 
shown  to  be  equally  effective.  Considering  only  the  soldiers  in  the  LET/LES 
program,  the  existing  qualification  standard  correctly  predicted  the  hit/miss 
outcome  of  the  live-fire  engagement  for  60%  of  the  soldiers.  By  contrast,  an 
alternative  standard  that  iqrored  Table  Ill  and  required  at  least  nine  hits 
on  Table  IV  correctly  classified  the  live-fire  results  of  68%  of  these 
soldiers.  Still  another  standard  that  ignored  the  LET  altogether  and 
required  at  least  four  hits  out  of  the  five  lE^  engagements  on  Table  IV 
provided  70%  correct  classification. 

For  soldiers  in  the  STAGS  pure  program,  no  relationships  were  found  between 
performance  on  the  STAGS  and  live-fire  performance. 

For  soldiers  in  the  STAGS/LES  POI,  a  relationship  was  found  that  involved  a 
requirement  of  at  least  one  hit  with  each  simulator  and  a  total  of  at  least 
four  hits  among  the  twenty  qualification  shots.  Ihis  requirement  provided 
63%  correct  classification  of  live-fire  performance. 

The  most  notable  result  of  this  area  or  the  analysis  i  .  that  tne  i  F S  seems  to 
provide  some  relationship  between  performance  with  the  simulator  and 
performance  with  the  DRAGON  itself. 


The  compilation  and  analysis  of  the  survey  responses  and  written  comments 
revealed  the  following  soldier  perceptions: 

o  all  simulators,  except  possibly  LES,  needed  mo'-e  launch  effects 
(noise,  heat,  smoke,  etc.) 

n  STAGS  was  much  more  sensitive  than  DRAGON  to  ounner  movement 

o  STAGS  hurt  confidence  and  morale 

o  Difficulties  with  STAGS  and  DRAGON  seem  easier 

These  perceptions  indicate  strengths  and  weaknesses  of  the  STAGS  simulator. 
Apparently  the  over-training  provided  by  the  STAGS  in  teaching  steady 
tracking  techniques  caused  the  soldiers  to  feel  they  had  qnod  control  over 
the  actual  DRAGON.  The  increased  sensitivity  of  the  STAGS  that  provided  the 
over-training,  however,  also  caused  the  soldiers  to  achieve  far  fewer  hits 
dump  practice  and  qualification  than  did  the  soldiers  who  trained  with  the 
LET/IES.  The  soldiers  hit  78%  of  all  targets  engaged  with  the  LET  or  LES  and 
30%  of  all  targets  engaged  with  STAGS.  This  led  to  confidence  and  morale 
problems  among  the  soldiers  training  on  the  STAGS. 

SUMMARY 


The  analysis  of  the  three  DRAGON  training 
results: 


programs 


produced  the  following 


o  the  programs  produced  essentially  the  same  levels  of  performance  in 
the  test  soldiers 

o  the  proqrams  cost  essentially  the  same 

o  performance  with  the  LES  provided  information  about  performance  with 
DRAGON 

The  following  results  concerning  DRAGON  gunner  training  in  general  were  also 
observed : 

o  half  of  the  live-fire  misses  were  due  to  unsuccessful  launches 

o  most  of  the  remaining  misses  were  due  to  tract  mg  instability  during 
the  last  few  seconds  of  flight 

o  tracking  control  in  the  horizontal  plane  during  launch  was  just  as 
important  as  vertical  control 
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Ip  today's  rapidly  changing  society,  it  is  important  to  stay  abreast  of 
current  developments,  especially  within  the  context  of  significant  military 
attitudes  which  could  affect  the  readiness  of  our  Forces.  Much  has  been 
written  in  recent  years  about  career  decision-making,  planning  strategies, 
career  involvement,  dual  careers  (Adams  1980;  Davenport  1984;  Hall  &  Hall, 
1979),  and  overall  commitment  and  adjustment  to  career  and  organization. 
Moreover,  Moskos  (1377)  has  noted  changes  from  an  institutional  to  an 
occupational  organizational  orientation  among  members  of  the  military.  Some 
suggest  that  today's  military  has  less  of  the  rational  organizational 
devotion  and  dependency,  and  more  of  the  entrepreneurial  protean  man  (Hall, 
1976;  1979).  Hall  (1979)  also  distinguishes  between  moral  and  calculative 
organizational  commitment,  with  a  shift  being  analogous  to  the  phenomenon 
observed  by  Moskos  (1977).  In  terms  of  personal  decisions  the  concern  with 
career  planning  and  strategies  utilized  in  making  those  decisions  also  has 
changed. 

The  present  paper  reports  recent  research  pertaining  to  career  attitudes 
and  proclivities  of  male  and  female  West  Point  graduates  (Adams  1984,  Adams  & 
Yoder  1984).  The  data  to  be  reported  were  imbedded  within  the  context  of 
larger,  world-wide  study  of  early  adjustment  experiences  of  US  Military 
Academy  graduates.  Specifically,  strategies  of  career  planning,  degree  of 
career  involvement,  and  overall  commitment  and  adjustment  to  the  Armv  were 
analyzed. 

METHOD 

Respondents.  The  participants  were  selected  based  upon  stratifying  by  sex, 
work  specialty  and  geographic  assignment.  The  samples  reflected  the  same 
proportions  of  branch  specialty  and  location  as  the  entire  population  of 
graduates.  A  ratio  of  three  men  were  selected  for  each  female  respondent. 

The  two  year  interviews  were  conducted  with  116  members  of  the  class  of  '80 
and  104  members  of  the  class  of  '81  at  locations  in  CONUS,  Korea,  and 
Germany.  A  breakdown  by  gender  and  location  is  presented  in  Table  1. 

Procedure.  The  two  year  protocol  was  the  first  to  include  extensive 
questions  cn  social  aspects  of  Army,  Army  life,  and  the  interaction  of  an 
Army  career  and  family  life. 

Interviews  were  usually  conducted  at  the  respondents'  duty  station, 
though  on  occasion  commuting  to  a  centralized  location  was  required. 
Interviews  were  generally  scheduled  for  4-6  individuals,  though,  in  practice 
they  varied  anywhere  from  1  to  7.  There  were  5  interviewers  in  all,  varying 
widely  in  number  of  interviews  conducted,  but  all  interviewing  individually. 
Interviewers  also  varied  as  to  sex,  race,  and  military/civilian  status. 
Interviews  were  usually  1  to  2  hours  in  duration,  and  were  all  tape  recorded 
for  later  content  analysis. 


Results  and  Discussion 


Several  questions  were  asked  about  career  involvement,  officer  role,  and 
tne  Army  lifestyle.  There  was  a  substantial  amount  of  ambiguity  about  the 
meaning  of  the  questions,  both  among  respondents  and  among  interviewers.  The 
primary  difficulty  was  the  meaning  of  ''career  involvement."  Did  it  mean 
attention  on  the  job  and  career  activities  or  did  it  mean  intent  to  remain  on 
active  duty  after  the  five-year  obligation.  In  this  analysis,  responses 
which  focused  on  the  former  alternative  will  be  discussed  in  this  section, 
while  those  addressing  the  five-year  decision  will  be  considered  with  a 
subsequent  question  on  change  in  career  commitment. 

The  most  interesting  results  in  responses  to  this  question  were  in  the 
differences  between  1980  and  1981  graduates  (Table  2).  About  25%  of  the  1980 
graduates  indicated  that  they  were  more  involved  in  their  careers.  The 
figure  for  1981  graduates  was  almost  50%.  There  seems  to  have  been  a  large 
shift  in  career  interest  during  this  one  year  time  frame.  The  shift  occurs 
for  both  males  and  females,  although  females  are  generally  less  likely  to 
become  more  involved  than  males.  A  similar  pattern  is  observed  in  the 
answers  to  questions  on  changes  in  adjustment.  Class  of  '81  respondents  are 
more  iikely  to  have  experienced  a  positive  change  and  males  are  more  likely 
than  females  to  have  had  a  positive  change.  Again,  interpretation  of  these 
results  raises  the  question  of  statistical  artifact  or  real  difference.  A 
significant  proportion  of  the  involvement  change  can  be  attributed  to  the 
radical  change  of  males  stationed  overseas.  Is  this  an  accident  or  did  the 
increased  value  of  the  dollar  make  everythin*1  seem  rosier? 


Answers  to  the  questions  on  what  factors  influenced  greater  career 
involvement  or  adjustment  to  an  officer's  role  shed  little  light  on  this 
issue  (Table  3).  Males  in  the  class  of  ‘81  were  much  more  likely  to  cite 
confidence,  job  satisfaction,  and  familiarity,  but  there  seems  to  be  no  clue 
about  why  these  factors  should  be  more  strongly  felt  on  overseas 
assignments.  They  do,  however,  make  sense  as  reasons  for  assuming  a  higher 
level  of  participation  in  an  Army  career. 


Females  from  the  class  of  ‘80  were  more  likely  to  express  a  negative 
change  toward  their  officer  roles  than  men.  The  majority  of  men  who 
perceived  a  positive  change  simply  felt  more  comfortable  and  familiar  with 
their  role.  The  majority  of  women  who  experienced  a  negative  change  were 
upset  by  different  treatment  for  women  and  felt  that  men  were  just  more 
likely  to  take  the  Army's  "flack"  than  women.  The  latter  comment  probably 
says  more  about  general  attitude  differences  between  male  and  female 
officers,  as  perceived  by  female  respondents,  than  any  other  expressed  in  the 
interviews.  Both  groups  perceive  the  same  problems,  for  example  both 
identify  similar  job  dissatisfaction  factors,  but  1980  female  graduates  tend 
to  be  less  tolerant  of  the  negative  aspects  than  men.  Perhaps  it  is  because 
many  begin  their  careers  from  a  more  defensive  perspective,  as  the  first 
female  West  Point  graduates,  members  of  the  class  of  '81  do  not  seem  to  have 
similar  views. 


;-V\ 


*  *.*  *  *  */  > "  *  * 


Only  a  small  number  of  individuals  indicated  a  change  in  adjustment  to 
the  Army  lifestyle.  The  most  mentioned  negative  factors  associated  with  the 
Army  lifestyle  are  adjusting  to  time  requirements  and  mandatory  social 
functions.  The  most  frequently  cited  positive  factor  is  getting  married. 

Six  ot'  ^ers  were  married  in  the  previous  year  and  this  seemed  to 

signir  .antly  contriDute  to  their  overall  adjustment  or  at  least  it  made  them 

happier. 

NOTE:  This  document  represents  the  views  of  the  author  and  not  the  official 
position  of  the  U.S.  Army,  or  any  other  governmental  agency  unless  so 
designated  by  other  authorized  documents. 

The  interview  tapes  were  transcribed  by  Dr.  Richard  J.  Orend  under  contract 
DAAG  60  85  M  1799.  The  research  program  was  supported  by  contract  13  ARI 
85-28  from  the  Army  Research  Institute  (Jerome  Adams,  principal  investigator). 
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TABLE  1 

Total  Participants 


Class  of  80 


Class  of  81 


Number 

%  of  Total 

Number 

%  of  Total 

UK| 

Male 

88 

75.9 

85 

81.7 

r*  <  " 

83 

Female 

28 

24.1 

19 

18.3 

B 

& 

Total : 

116 

104 

Stationed  in  CONUS* 

83 

71.6 

80 

76  : 

*’•/ 

'  w 

Stationed  OVERSEAS** 

33 

28.4 

24 

23.1 

"T 

<\V 

Total : 

116 

104 

CONUS  Males 
CONUS  Females 
Total : 

OVERSEAS  Males 
OVERSEAS  Females 
Total : 


including  Hawaii 
**Germany  and  Korea 
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TABLE  2 


Change  in  Extent  of  Career  Involvement 


Class 

of  80 

Class 

of  81 

Total 

Sample 

Change  in  Career 

More 

Involved 

29 

25.0 

49 

47.1 

Involvement: 

Less 

Involved 

IS 

12.9 

9 

8.7 

By  Gender 

Change  in  Career 

Males 

More 

Involved 

22 

25.0 

41 

48.2 

Tnvo1 vement: 

Less 

Involved 

8 

9.1 

6 

7.1 

Females  - 

More 

Invol ved 

7 

25.0 

8 

42.1 

Less 

Involved 

7 

25.0 

3 

15.8 

By  Location 

Change  in  Career 

CONUS  - 

More 

Involve 

22 

26.5 

34 

42.5 

Involvement: 

Less 

Involved 

8 

9.6 

8 

10.0 

OVERSEAS  - 

More 

Involved 

7 

21.2 

15 

62.5 

Less  Involved  7  21.2  1  4.2 


TABLE  3 

Factors  Influencing  Career  Involvement 


Positive  -  More  Involvement 

Males 

_J_ 

Class  of  80 

Females 

¥ 

Total 

Males 

U 

V 

Class  of  81 

Females 

if 

Total 

a 

1. 

Confidence,  job  knowledge 

2 

0 

2 

10 

1 

li 

2. 

Job  satisfaction,  enjoy 
work 

5 

0 

5 

13 

2 

15 

3. 

Security 

2 

0 

2 

1 

0 

1 

4. 

Willing  to  put  in  time 

2 

0 

2 

3 

1 

4 

5. 

Competitiveness 

1 

0 

1 

0 

0 

0 

6. 

Think  about  it  more 

0 

1 

1 

3 

0 

3 

7. 

.Married  to  another  officer 

0 

1 

1 

0 

0 

0 

8. 

Ccmfortable 

0 

1 

1 

0 

0 

0 

9. 

Sense  of  duty  involved 
with  goals 

0 

1 

1 

0 

0 

0 

10. 

Increased  responsibility 

0 

0 

0 

2 

2 

4 

11. 

Thinking  about  change  in 
assignment 

0 

0 

0 

8 

1 

9 

12. 

More  in  control 

0 

0 

0 

i 

1 

V 

1 

1 

Negative  -  Less  Involvement 

1.  Frustrate  ("brick  walls") 

1 

0 

1 

2 

0 

2 

2. 

Lack  job  freedom 

1 

0 

1 

0 

0 

0 

3. 

Current  assignment 

1 

2 

3 

0 

1 

0 

4. 

No  other  competition  in 
life 

1 

0 

1 

0 

0 

0 

5. 

Problems  leave  individual 
numb 

1 

0 

1 

0 

0 

0 

6. 

Not  in  MOS 

0 

2 

2 

0 

0 

0 

7. 

Dissatisfied  with  commander 

0 

1 

1 

0 

0 

0 

8. 

Not  enough  time  with  f.oops 

0 

3 

3 

0 

0 

0 

9. 

Poor  commanders 

0 

3 

3 

0 

0 

0 

10. 

Can't  quit/move 

0 

2 

2 

0 

0 

0 

11. 

Not  good  career  for  women 

0 

3 

3 

0 

1 

1 

12 

Senior  officers  to  me  are 
money  oriented 

0 

1 

1 

0 

0 

0 

13. 

Not  motivated  to  keep  up 
in  field 

0 

0 

0 

1 

0 

1 

14. 

Not  as  concerned  about 
always  doing  well 

0 

0 

0 

0 

1 

1 

15. 

Future  in  Army  seems  dun 

0 

0 

0 

2 

0 

2 

16. 

Disillusioned  by  WP  on 
role  of  officer 

0 

0 

0 

1 

0 

1 

215 
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ks  a  fa. Iow-on  to  Roche's  1979  survey  of  business  executives  and  decker's 
1 9 o 4  survey  of  Air  Force  officers  exploring  the  mentoring  concept,  this  study 
exonned  mentoring  from  the  perspective  of  senior  Air  Force  officers  who  have 
been  both  proteges  and  mentors.  Respondents  were  95  Air  Force  officers 
selected  for  the  1985  entering  class  at  Air  War  College.  Of  these,  58% 
responded  that  they  have  had  a  mentor  and  48%  stated  that  they  have  been  (or 
currently  are)  mentors  for  junior  officers.  Comparisons  were  made  to  the 
earlier  data  gathered  by  Roche  and  by  Uecker  with  respect  to  military 
background,  career  factors,  and  effects  of  mentoring  on  the  respondents. 
Furthermore,  roles  fulfilled  by  a  mentor  were  contrasted  from  the  perspectives 
of  the  protege  versus  the  mentor.  Results  for  this  sample  nf  senior  officers 
verifipd  the  prevalence  and  perceived  importance  of  mentoring  in  the  United 
States  Air  Force. 


Introduction 

Mentoring  has  been  defined  as  a  relationship  between  a  senior  member  and 
a  junior  member  of  an  organization  in  which  the  senior  member  is  influential 
in  molding  and  shaping  the  career  of  the  younger  member  (Uecker,  1984).  The 
concept  of  mentoring  has  recently  received  considerable  attention  throughout 
the  field  of  management.  Trade  journals,  in  particular,  abound  with  articles 
ranging  from  cross-gender  mentoring  to  reasons  why  one  should  (or  should  not) 
enter  into  a  mentoring  relationship. 

empirical  work  in  the  arpa  is  limited  to  two  recent  surveys.  In  a  study 
by  He i dr i ck  and  Struggles,  Inc.,  and  reported  by  Roche  (1979),  1250  executives 
recently  appointed  to  their  positions  were  surveyed.  This  study  found  that 
"nearly  two-thirds  of  the  respondents  reported  having  had  a  mentor  or  sponsor" 
(Roche,  1979,  p.  14).  The  research  also  discovered  that  mentored  executives 
earned  more  money  ,t  a  younger  age  and  had  a  higher  degree  of  satisfaction 
with  their  jobs  and  their  career  progress.  Furthermore,  those  in  the  mentored 
group  were  better  educated  arid  more  likely  to  have  formulated  and  followed  a 
Career  plan. 

1  modified  and  expanded  version  of  Roche's  survey  was  recently  applied  in 
the  U.u.  Air  Force;  Uecker  (1984)  surveyed  25?  officers  attending  Air  Command 
and  Staff  College  (ACSl)  and  Air  War  College  (AWC)  to  examine  the  prevalence 
arid  effects  of  mentoring  in  the  Air  Force.  Results  of  this  survey  were 
reported  at  last  year's  Military  Testing  Association  conference  (Uecker  & 

D 1 11a,  1984).  Mentoring  was  not  found  to  be  as  p'-.ulent  in  the  military 
Sample  (42i  versus  Roche's  64%),  but  the  effects  were  similar.  Mentored 
officers  were  better  educated  and  more  likely  to  have  formulated  a  career 
plan.  They  were  more  likely  to  have  been  promoted  early  (a  parallel  to 
Roche's  f  i n d a n y  of  earning  more  money  at  an  earl ier  age)  and  had  greater  job 
and  career  progress  satisfaction  (decker  &  Dil>  i,  1984). 
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cinc„  the  prevalence  of  nentormg  in  tne  Air  Force  officer  corps  had  been 
previously  suopj-ted,  the  thrust  of  this  project  was  to  reexamine  the 

jinnee  of  mentoring  and  to  furthe*  invest iga*--  the  phenomenon  from  the 
mentor's  perspective.  To  accomplish  this,  a  sample  of  high-potential  senior 
officers  a  as  surveyed  to  find  out  if  they  had  nai  mentors,  if  they  had  become 
mentors  for  ethers,  and,  >n  general,  to  examine  tneir  point  of  view  concerning 
tn-  merit  ar  mg  process  within  tne  Air  Force.  The  Survey,  based  on  those  of 
•joche  19T9'  and  decker  (1984),  also  attempted  to  estimate  the  perceive. 

,.f f ►  r  ne  ■"  nt)r  had  on  the  care-r  of  ms  protege  and  on  the  Air  'rorce. 


Methodology 


The  sample  for  this  Study  needed  to  be  drawn  from  a  population  of  Air 
Farre  officers 'senior  enough  to  have  had  the  opportunity  to  be  mentors  as  well 
as  to  nave  had  mentors.  Air  Fore*  policy  precluded  sending  questionnaires  to 
general  officers  who  would  most  closely  parallel  Roche's  (1979)  sample  in 
m-vtinq  this  criteria.  Authorization  to  survey  designees  for  the  1985 
entering  rlass  of  AWC  was  granted.  These  officers,  112  in  number,  were 
lieutenant  colonels  and  colonels  (0-5  and  0-5)  coming  out  of  a  variety  of 
leadership  and  staff  positions  including  squadron  cooinanders,  directors  at  air 
division  level,  and  system  program  directors.  Furthermore,  they  had  been 
identified,"  by  the  fact  of  their  selection  for  AWC,  as  having  high  potential 
for  further  advancement. 


Procedure 


Surveys  wore  mailed  to  officers  ar  t heir  duty  addresses  several  months 
or  v  thuir  departure  for  AWC.  Participation  was  voluntary,  and  respondents 
wrre  assured  of  anonymity.  They  were  asked  to  mark  their  i esponses  directly 
on  the  survey  instrument  and  return  it  «n  a  postage-paid  return  envelope. 

B-  -  )f  tn-  frequent  negative  connotation  of  mentoring  in  the  Air  Force, 
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fist  *'um-'»t  was  based  on  decker’s  (  1984)  questionnaire  used  the 
.vwC  and  A(,SC  students.  The  format  of  the  survey  was 
,,,  tnrt  r I-  officers  would  respond  to  items  separately  from  the 
perspe  twe  of  b.  ing  a  protege  and  from  being  a  mentor.  Eighteen  items 
f  o<  used  on  Me  officer  as  protege,  fourteen  as  a  mentor;  fourteen  items  were 
concern. d  with  d'sirable  char,n  ter  i st ic$  of  a  mentor,  and  ten  items  addressed 
per  vOnal  ba.kgniund,  prumotion  history,  uni  current  satisfaction.  Space  was 


P'-'H'i  iel  to  list  important  characteristics  or'  a  protege  arc  other  open-ended 
consents  ani  sjqgest’O'is  it  the  end  of  th-=>  survey. 

f’rs  jl  ts 


■  1  t 


: n ■■  il:  Cit<ric-''s  sent  sj''.-jys, 
.  t h e  1 r  commission  c 

:  '  i  1*r  "y  1  ft'.  Tn~  C-'dlJO  reSOOhS 

i I f n 0  j g h  ag-S  ringed  from  'J  to  33 


a  toti1  0*’  95  responded.  The 

fdT3  55% ■  vice  0T$  i,37%)  or  i  service 
or  33-  it  commissioning  was  2?  y-'-ars , 


irq- 


5  L 


f'r  eg-  it 

os?  hid  an  advanced  deoree  ' 93^ )  and  had 
3?  'east  me  De ;  cw-tne-prp -os  ion-zone  333 1  promotion  37%j.  Toe 
-•■'jp  idp^t'fied  with  no  smg!~  oajor  •com"’ and  >36%;  although  the  three 


1  jrj-st  in  is  .ve-'-  wel  1  -represe^t-d'-S AC  ,21%),  T-C  (16%),  and  ‘-‘AC  (12%). 


in 3  experience 


Ct  tn-  95  respondents ,  53  reported  Paving  had  a  mentor  who  toox  a 
personal  interest  in  then  and  guided  or  helped  mold  their  careers.  This  61% 
--at-5  of  mentoring  is  very  close  to  the  64%  reported  by  Roche  (  1979)  for  the 
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comparing  the  percentages,  nc  significant  inference  ..as  found.  The  median 
response  tor  the  number  of  mentors  was  two,  with  a  range  from  one  to  six. 

‘•lost  (23  of  the  53)  said  thei*"  mentor  had  first  exhibited  an  interest  in  them 
only  fpiriy  recently,  after  the  tenth  year  of  service.  Thg  rnajo,,'ity  (35) 
indicated  that  they  still  had  a  relationship  with  their  mentor,  although  most 
of  these  (24)  described  the  relationship  as  "friendly"  rather  than  "close". 

Toe  largest  group  (26)  reported  having  had  a  general  officer  as  a  mentor;  the 
next  largest  group  (12)  said  their  mentor  was  their  immediate  supervisor. 
wh*n  asked  how  much  influence  their  mentor  had  exerted,  most  said  it  was 
substantial  (25)  on  moderate  (21);  only  a  few  chose  the  extremes  of 
extraordinary  (5)  or  little  (7)  influence. 

In  the  second  section  of  the  survey,  4b  officers  (48%  of  the  sample) 
stated  that  they  had  served  as  a  mentor  to  another  individual.  Two-thirds  of 
these  officers  ( 30  the  46)  reported  that  they  currently  had  one  or  two 
proteges,  when  asked  how  long  their  longest  mentoring  relationship  had 
las  tea,  the  modal  response  was  a  tie  between  two  and  three  years.  This 
finding  may  be  an  indication  that  these  officers  had  only  recently  moved  into 
positions  of  command  and  leadersh’p  where  they  could  be  mentors,  or  it  may  be 
a  result  of  the  very  mobile  Air  Force  way  of  life,  such  that  relationships  do 
not  extend  much  beyond  one  assignment.  When  asked  how  much  influence  they  had 
exerted  over  their  proteges,  again  there  was  a  fairly  equal  split  between 
substantial  (2°)  and  moderate  (24)  influence  with  none  reporting  extraordinary 
or  little  influence. 
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s  f f  e  C t  f  of  Mentoring 

This  study  failed  *0  find  any  significant  differences  between  mentored 
and  unmentored  groups  with  regard  *u  formulation  of  a  career  ptan,  job 
satisfaction,  or  early  promotions  is  had  been  found  m  the  previous  research. 
However,  the  small  simple  sice  of  this  stud/  may  have  been  a  limiting  factor. 
A  significant  difference  wa  ,  found  bt  tween  groups  with  respect  to  career 
progr-ss  satisfa'Mon  (t—2.85;  p  . 1  il ) ,  with  the  mentored  group  reporting 
greater  sati start  ion. 
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co r  t»o  46  officers  who  nad  served  is  nertn's,  there  we^  no  si gn'f icant 
differences  with  aspect  to  ea-'ly  promotions  jr  career  progress  satisfaction; 
no«ev they  had  significantly  higher  job  satisfaction  tnan  tneir 
cent,  iipor  an  i  is  why  oad  not  been  mentors  (£=-2.25;  p-.05). 

Soles  of  t^e  f’entor 

Th-s  stjdy  examined  roles  of  tne  m.-r-ryr  f rji  tne  perspective  of  both 
uotegas  and  mentors.  Th's  allowed  for  a  contrast  b-tween  the  two 
perspectives  as  1  as  a  comparison  *uo  beex^r's  (  1984)  results  for  proteges 
from  a  similar  sample.  For  each  of  the  ten  roles  identified  by  L«a  and 
^eibowcz  (1953),  respondents  wc-r-  asked  to  md’cate  if  the  r0le  was  "most 
’mpo-tant”  'assigned  a  scale  value  of  3),  major  :21,  secondary  (i),  or  a  role 
not  played  '0).  Mean  responses  for  each  rCie  tor  this  study's  group  of  46 
mentors,  58  proteges,  and  Decker's  (1934)  group  of  106  proteges  are  presented 
in  Table  1.  Rankings  of  the  ten  roles  within  each  group  are  also  presented 
for  nase  of  comparison. 


Table  1 
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AWC  & 
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Proteges 

(n=46) 

( 

n  =  58) 

(n=106) 

Advisor 

2 

.158 

(1) 

i. 

706 

(5) 

1.853 

(2) 

Couns  al or 

1 

.932 

(2) 

i. 

708 

(4) 

1.598 

(5) 

Motivator 

1 

.793 

(31 

l. 

734 

(2) 

1.800 

(3) 

Role  Model 

1 

.711 

(4) 

l. 
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(1) 

1.924 

(1) 

Guide 

1 

.675 

(5) 

i. 

321 

(8) 

1.500 

(7) 

Teacher 

1 

.611 

(6) 

l. 

344 

(7) 

1.441 

(3) 

Communicator 

1 

.561 

(7) 

l. 

093 

(9) 

1.505 

(6) 

Supporter 

1 

I 

.  5C0 

(3) 

i 

X  • 

511 

(6) 

1.613 

(4) 

Sponsor 

1 

.343 

19) 

1. 

716 

(3) 

1.426 

(9) 

Pro  tec  tor 

1 

.095 

(10) 

0. 

713 

(10) 

0.964 

(10) 

Ratings  and  assigned  scale  values  were: 

Most  Important  =  3;  Primary  =  2;  Secondary  =  1  ; 
Not  Played  =  0. 

2 

Data  adapted  from  Decker,  1984. 


The  largest  difference  in  ranks  occurred  for  the  controversial  role  of 
"sponsor".  This  term,  which  often  carries  a  negative  connotation  in  the  Air 
Force,  was  rated  relatively  low  by  Uecker's  (1984)  sample  and  by  the  smaller 
group  m  this  study  which  rat.wij  the  roles  from  the  perspective  of  being  a 
mentor.  Yet,  when  rated  from  the  perspective  of  being  a  protege,  the 
respondents  of  this  study  gave  it  the  third  highest  mean  rating;  in  fact,  it 
received  the  largest  percentage  of  "most  important"  responses  for  this  sample 
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.  vied.  -r’y  O''"--  th 1  rp  of  Roc  he 1  $  respondents  reported  having  had  two  or  more 
'•"tc'S  despite  t n e i •*  earlier  st  ao  ,  while  72A  pc  the  a *‘C  officers  (6 4%  in 

>.  C  K  er  1  S  ,  i  9  ;•*  j  Stjdy;  reported  twO  Or  more  OntOO, 

:on-c  19-9;  found  tnat  62a  of  tne  1250  executives  had  proteges.  In  this 
it<jy,  or.’y  J 5 a  of  tne  officers  indicated  t-.ey  had  proteges;  interestingly, 
ost  of  tn-ie  had  previously  naj  a  renter,  so  the  pnenomenon  seems  to  be 
’  a o~'y  S’-lf-p-'-rpetuating. 

wi*n  regard,  to  tne  reacts  o(  t'' r ■  ng ,  mf  iters  who  had  mentors  had 

g-'-aier  career  progress  satisfaction,  while  those  who  served  as  mentors  had 
higne*'  job  satisfaction.  lnesa  results  are  consistent  with  previous  findings 
(°oche,  1929;  Uedxer,  1984'  dnd  with  the  effects  predicted  from  career 
d^v'-lopn-'nt  theories  discussed  elsewhere  (Dilla,  1985). 

Contrary  to  Roche's  (1929)  findings,  no  difference  was  found  for  this 
Simple  with  regard  to  formulation  of  a  career  plan.  It  should  be  noted  that 
Leeker  found  an  relationship  for  his  total  sample  but  not  for  the  AWC 
respondents  alone.  This  difference  from  Roche's  results  may  be  due  to  the 
centralized  reassignment  processing  within  the  Air  Force.  Respondent  comments 
in  heated  tnat  "needs  of  the  Air  Force"  and  unforeseen  career  opportunities 
sometimes  dictated  career  plan  changes. 

Ragarj’ng  the  roles  of  tne  mentor,  proteges  most  often  described  their 
nentor  as  a  role  model,  although  the  mentors  assigned  less  importance  to  this 
function.  Inis  difference  sterns  natural  since  it  would  be  difficult  for  the 
nentor  t<*  tell  to  what  extent  the  protege  is  observing  and  striving  to  emulate 
his  behavior.  Tne  most  important  roles  from  the  mentors'  perspective  were 
those  of  advisor,  counselor,  and  motivator,  roles  that  are  more  active  but  not 
highly  riir-rt'vv.  Proteges  were  in  agreement  with  the  relative  importance  of 
the  motivatjr  role  but  seemed  less  willing  to  admit  that  their  mentors  had  to 
"coiinsn  1 "  tn-m.  This  difference-  may  b“  die  to  some  of  the  negative 
Connotations  to  the  term  itself. 

There  was  .leu*  agreement  across  all  groups  that  the  mentor  does  not 
serve  as  a  protector  to  the  protege,  or  at  K ast  does  so  very  infrequently. 

It  is  noteworthy  that  the  mentors  assigned  greater  importance  to  this  role 
than  either  group  ;f  proteges;  further  study  of  the  occurrence  of  this 
function  may  be  merited.  Respondents  also  tended  to  play  down  the  importance 
of  fhe  sponsor  role,  with  the  exception  of  the  ;>w'C  proteges.  It  appears  that 
the  more  senior  group  of  AwT  oft  iters  perceived  tnat  their  mentors  had 
prj/id‘>d  growth  opportunities  for  them  to  a  greater  extent  than  did  Ueiker's 


{ 1 9S4 ;  coibmed  sample  of  AwC  and  ACSC  students.  However,  the  low  rating  by 
the  sam^  gr.jup  of  officers  wh-n  •  tewing  the  ro’es  as  mentors  is  puzzling, 
'urtner  research  witn  "lore  detailed  definitions  of  the  terms  and  larger  sample 
sizes  nay  be  the  only  way  to  resolve  or  clarify  th^se  differences. 

Comments  at  toe  end  of  tne  surveys  indicated  tnat  many  officers 
understood  ana  supported  the  use  of  mentoring  in  the  Air  Force;  however,  many 
harsh,  negative  comments  revealed  stereotypes  and  misconceptions  even  at  this 
senior  officer  leve*.  The  A.r  Force  should  publicize  the  reasons  for,  and 
potential  benefits  of,  the  informal  mentoring  process.  Furthermore,  research 
on  mentoring  should  cont'nie  and  be  expanded  to  both  a  broader  scope  and 
Higher  level  of  officers  m  order  to  better  understand  the  dynamics  and 
effects  of  the  process. 
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The  purpose  of  to i s  study  was  to  establish*  i *  feasible*  a 
pre-change  database  that  tne  Na-v  could  use  to  assess  the 
effectiveness  of  ita  itr  revisions  to  tne  Surface  (warfare  Officer 
( 5W0  >  career  path. 

Bac  w ground 


The  Surface  Warfare  community  has  traditionally  stressed 
that  its  officers  be  'generalists"  with  diversified  wc-k  e>per- 
lence  oercei-ed  to  be  ca'-ee^  ennancinq.  However,  recent 
program  and  career  assianment  policy  changes  have  implement.:! 
changes  that  increase  spec i a  1 1 : a t : on  orior  to  the  Executive 
Officer  CKO)  tour.  The  criqin  of  these  sicnificant  cnanoes  tc 

tne  Surface  Warfare  Officer  *  SWO  >  career  oath  car,  be  traced  to 
t -,e  lc8i  Surface  Warfare  Commanders  Conference  cSWCC)  where  the 
issue  of  fieet  readiness  a  no  its-  relationship  to  SwQ  technical 
competence  and  experience  was  discussed.  The  SWCC  concluded  that 
the  (generalist  approach  tc  officer  development  actually  under¬ 
mined  ana  degraaed  readiness  by  not  allowinq  officers  to  aain  the 

edu.site  technical  experience  required  to  manage  operate 
today's  complex  shipboard  s. stems.  The  following  FV81  statistics 
were  cited  to  support  their  contention: 

-Only  RT.  of  engineering  department  heads  had  prior  engineering 
experience  as  a  division  officer. 


--Only  38'-  of  operations  depar  tmer.  t  heads  had  prior  operations 
experience  as  a  division  officer  . 

--On  1  >  AS’/.  of  weapons/combat  systems  department  heads  had  prior 
weapons/combat  systems  as  a  division  officer  . 

In  every  instance*  fewer  than  half  of  the  officers  had  an^ 
work  experience  in  the  type  of  department  that  they  were  being 
asked  to  manage. 


As  a  direct  result  of  the  SWCC  concern,  maior  revisions 
were  maoe  tc.  SWO  training  programs  and  career  policies  over  the 
next  two  year  period.  The  goal  of  tne  revisions,  p  omulgated  as 
NAVGP  105/83,  was  to  develop  more  technically  experienced 
officers  at  the  key  department  head  le-el.  which  would  in  turn  be 
expected  to  promote  an  ircrease  in  the  operational  readiness  of 
ships.  As  a  department  head*  the  SWO  must  manaqe  the  mamte- 
oper  at  ion,  and  employment  nf  complex  ev stems/p  1  a t f or  me . 


nance. 


T>>=  new  career  oath  is  tai  Icea  to  provide  them  with  the  opportu¬ 
nity  to  oa:r.  specialized  technical  expertise  in  the  first  1  -  I E 

.ea-s  of  their  havy  career  without  the  expectation  of  being  a 
Generalist.  Specifically,  the  new  career  path  is  structured  fc 
socialization  in  one  c  f  tr.e  three  ir.a  ior  surface  warfare  areas: 
engineering,  ccerat  lor.s.  and  weapons  comoat  svStems. 

Ag,ocates  oT'  the  revised  SWO  career  path  insist  that  this 
shift  toward  spec  i  a  1  i  z  a  1 1  or.  was  techr.c  1  oa  , -dr  i  v  en  and  inevitable. 
However,  critics  argue  that  these  newly  created  "specialists 
will  encounter  difficulties  at  the  XC  CO  level  where  familiarity 
with  the  entire  ship  is  critical.  uolv  time,  and  a  well  planned 
evaluatiGr  cf  the  impact  of  the  revisions  will  prove  which  ~x 
these  contrasting  views  is  indeed  correct. 

Ob  iec  1 1 ■ es 

The  c  >  er  a  1  obie~ti.es  of  the  stud,  were: 

Identify  e-ostina  measures  of  ship  department  readiness  and 

0 .  Build  a  pre-chanqe  ipric  to  implementation  of  the  new 
caree-  path  database  consisting  of  Ft83-FY8A  data  for  the  most 
cr j- its nq  measures. 

3.  Assess  the  consistency  across  measures  and  the  stability  of 
inch'  idual  measures  over  the  three  year  time  period. 

n.  Identify  measures  acceptable  foi  inclusion  in  a  pre  change 
database  that  the  fJavy  could  use  3-5  years  hence  to  evaluate  the 
impact  of  the  career  path  revisions  on  fleet  readiness. 

As  the  proiect  was  of  short  duration  (9  months)  and  funding 
was  limited,  it  was  net  possible  tc  design  and  implement  proced¬ 
ures  for  e-aluatino  career  path  effects.  Therefore,  the  stud, 
was  limited  to  l nvest l oa t i no  the  appropriateness  of  using 
existing  measures  of  ship  department  readiness  and  performance 
to  assess  the  impact  of  the  changes. 

Method 

A  multiple  measure,  or  "performance  profile,"  approach  was 
taken  under  the  assumption  that  no  single  performance  measure 
could  acceptably  serve  as  an  evaluative  standard  for  the  new 
career  path.  It  was  theorized  that  bv  focusinq  on  essential  1  r  a 
batter,  of  readiness  indices,  the  impact  of  the  new  policies 
could  be  inferred.  An  analysis  of  i nter -measur e  consistency  and 
year-to-year  stability  of  individual  measures  would  indicate  if 
the  multiple  measui  e  appi  oach  was  appropriate. 

Two  dif f ei  ent  classes  of  data  were  used  to  build  a  prelim 
inar,  pre-change  database.  The  first  class,  yielding  "ship 

data,  m.ol.ed  identifying  existing  measures  of  departmental 
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readiness  ar,g  performance  and  collecting  data  for  those  satisfy¬ 
ing  acceptaDi  lit,  criteria.  Tr,e  second  class  provided  "personnel 
Data"  that  miar.t  describe  the  performance  of  department  heads  as 
reflected  in  the  personnel  records  cf  indi.idual  officers. 

1  r  cder  to  determine  which  ship  data  measures  of  readiness 
were  apcropr iate  for  inclusion  in  the  pre-change  database-  and 
to  grov'ae  a  means  of  c r css-measur e  ccmparison,  nine  evaluative 
criteria  were  developed.  For  e x amp  1 e ,  numeric  data  were  reguired 
for  Q‘jart  i  tat  l  -  e  summarization.  Addition  all,,  scores  could  net 
be  inflated  such  that  improvement  potential  and  variability 
suffered.  It.  general.  measures  had  to  be  oojecti.el.  scored, 
focus  on  department  nead  performance,  be  available  in  a  format 
facilitating  r ac i d  computer  analvsis,  and  be  well  accepted  D.  the 
SvoC  community.  These  criteria  were  developed  to  address  issues 
of  reliability  and  validit.  while  also  reflecting  project-related 
constraints. 

C.er  fift.  inter. lews  were  conducted  with  senior  le.el 

officers  in  tn.e  Surface  warfare  communitv  to  identify  existing 
Pleasures  of  decs"  tme-'t  a  I  readiness.  1  nf  o  r  ma  t  i  or,  collected  abo^t 
measwres  included:  measurement  Cv'Cle;  scoring  procedures;  fleet 

reputat.cn,  etc.  the  interviews  vielded  21  candidate"  measures 

C  „  „  _  .  ,  '  .  ^  V  V-  «■  *m.  rrn.  «  —  t-v  -*  v-v  Th  r  <*  <“ 

T  ^  p  u  ?  1  b  *  t  i  <  >_  a  o  a  w'  •  ■  i  '  •  >  tr  i_-  cc  v.  ••oii'jtr  uc  u  o  ed  o  t-  •  *  •  _  v 

measures  were  then  individually  scored  on  the  nine  evalwati.e 
criteria.  A  measure  was  eliminated  from,  consideration  if  its 
standing  gn  an.  criterion  was  completely,  unacceptable. 

Si'  C  ^  tK,C?  cl  l  meat;  .-dc  prC  E*  d!  t0  tT  E  1  *=*  On  <5  1  1  r  r  I  - 

teria,  however,  data  for  only  three  were  actually  col lec  ted 
and  analyzed.  the  three  remaining  measures  were  not  included 
because  o  substantial  amount  of  clerical  effort  would  have  beer, 
reguired  to  extract  relevant  information  and  transcribe  it  tc  a 
fc mat  facilitating  computer  analysis.  the  three  ship  data 

measures  included  in  the  pre-change  database  were: 

1.  Propulsion  Exam inino  Board  (PEB)  Assessments  (both  the 
Lighting  Off  Examination  <  L  0  E :  and  Operational  Propulsion  Plant 
Examination  (  OPPE . 

2.  Nuclear  Weapons  technical  Inspections  (Nuclear  Weapons 
Acceptance  Inspection  'NWA]>;  Nuclear  technical  Proficiency 
Inspection  (NTPI>;  and  Defense  Nuclear  Security  Inspection 
( DNS  I  i  . 

3.  Departmental  Excellence  Awards  (presented  in  conjunction 
with  the  Battle  Efficiency  Awards  cycle). 

Two  types  of  personnel  data  were  collected:  1)  The  propor¬ 
tion  ot  department  heads  failir  to  complete  their  full  tour 
during  the  Fv8B“Ff8F  period*,  and  2)  The  annual  number  of  depart¬ 
ment  heads  involved  in  a  Detachment  for  Cause  (DEC)  action.  The 
former  is  often  informal  1 .  i  idirati  /p  of  poor  job  performance 
while-  *■  L t  latter  is  the  formal  administrative  action  for  relie. 
ma  an  officer  of  his  duties.  The  assumption  is  that  the  inci¬ 
dence  of  both  of  these  occurrences  would  decrease  if  the  new 
career  path  was  ha.  i,,g  its  intended  etfe.t. 
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corisister.cv  and  "reliability."  While  perfect  consistency  from 
year-tc-vear  was  not  expected.  10*/.- 15V.  variation  was  the  ma>  in.un 
acceptable.  Second,  measures  were  assessed  for  trends  in  risir.p 
or  falling  pass  rates  across  tie  three  year  time  period. 
Final lv,  c r oss-measure  comparisons  were  made  to  evaluate  the 
ex  cent  to  which  the  various  assessments  of  departmental  readiness 
were  in  agreement. 

The  generally  poor  stability  of  ship  data  performance  measures 
is  reflected  in  Figure  1  which  presents  LOE  and  OPPE  pass  rates 
for  several  snip  types.  Substantial  vear - to -year  fluctuations  in 
pass  rate  pe^centaaes  are  evidenced  in  the  variability  columns. 
For  the  LOE.  variability  was  unacceptably  high  -  both  ship 
t,pes  presented.  That  is,  yearly  oass  rate  differences  rang  mo 
from  SO*. -50V.  do  not  pro.ide  the  staple  performance  baseline 
required  to  evaluate  effects  of  the  new  career  path.  While  OPPE 
variability  was  marginally  acceptable  for  DD/DD6  (  1  1  */./ 1 6". )  ,  it 
was  completely  unacceptable  for  LSD/LST  <  50’/. /89*/. )  .  No  consistent 
trends  in  pass  rates  were  evident  foi  either  measure.  Paps  rates 
appeared  to  rise  and  fall  in  a  random,  unsystematic  manner  . 


FIGURE  1 

Examples  of  LCE  and  OPPE  Pass  Rate  Instability 
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All  numbers  expressed  as  percentaqes. 


In  contrast  to  PEB  inspections,  NWT I s  displayed  rather 
qood  stability.  A  summary  of  NWT I  pass  rates  for  four  ship 
types  is  oro  ided  m  Fiqure  8.  Several  ot  the  variability 

per  c  ont  aac?s  fall  below  the  10*. -15’/.  tolerance  previously  discuss¬ 
ed.  Pass  rates  were  exceptionally  staple  for  Auv i 1 1 lar ,  class 
ships  whei  e  the  variability  was  or.l  ,  V.  in  the  Pacific  fleet  and 
ft.  m  the  At  la.  die  fleet  across  the  U.ret  year  period. 


i1.  t  * 
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FIGURE  2 


Examples  of  NWT  I  Pass  Rate  Stability 
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All  numbers  expressed  as  percentaoes. 


Inter-measure  comparisons  made  on  a  year-to-year  basis 
indicated  that  there  was  little  consistency  amonc;  readiness 
indices.  For  example,  there  was  essentially  no  relationship 

between  ship  types’  pass  rates  on  the  OPPE  and  on  the  NWT I . 
Although  all  measures  ostensibly  assess  the  operational  readiness 
of  ship  departments,  1 nter -measure  consistency  was  lacking. 

The  final  ship  data  measure,  departmental  excellence 
awards,  displayed  both  excellent  (less  than  5%  variability)  and 
poor  (me  e  than  35’4  variability)  stability.  There  appeared 
to  be  some  type  of  interaction  between  ship  type  and  award 
category  as  awards  were  often  stable  for  one  ship  type  but 
highly  volatile  for  another.  Because  of  this  inconsistency, 

departmental  excellence  awards  do  not  provide  the  stable  perfor¬ 
mance  baseline  desired. 

Analysis  of  personnel  data  measures  revealed  that  neither 
was  aporopr late  for  inclusion  in  such  a  pre-change  database.  The 
percentage  of  department  heads  leaving  the  job  early  was  quite 
small  (maximum  of  10. 3\)  and  no  year-to-year  trends  were  evident. 
Figure  3  provides  a  summary  of  performance-related  DFC.s  by 
ma  iOr  department  for  the  three  year  period.  Once  again,  very  few 
department  heads  are  represented  in  this  measure  as  there 
were  a  total  of  only  59  such  actions.  Variability  of  DFCs 
year-by-year  was  qreater  than  desirable  and  no  trends  were 
identified.  For  example,  while  DFCs  were  steadily  on  the  rise 
in  the  Atlantic  fleet,  a  similar  trend  was  not  evident  for  the 
Pac i f l c  fleet. 
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FIGURE  3 


Per  for  iTujnc  e-Re  1  a  ted  DFu  Actions  for  Major 
Department  Heads  in  FY82-FY94 
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Anal  >  sis  of  ship  data  perfom.ance  measures  indicated  that 
such  indices  were  not  stable  enouah  to  establish  a  performance 
baseline  for  assessment  of  career  path  influences.  The  NW1  I 
.-.as  the  onl,  measure  to  display  an,-  promise  for  such  an  applica¬ 
tion.  In  addition,  consistency  among  multiple  measures  of 
readiness  and  performance  was  not  evident.  Results  from  one  tvpe 
of  evaluation  were  not  in  aareement  with  those  from  another  on  a 
year -b ,  /ear  basis. 

Personnel  data  measures  proved  to  be  equally  unacceptable  for 
evaluating  the  impact  of  the  new  career  path  on  readiness.  In 
both  instances,  the  number  of  department  heads  represented  was 
very  small  and  stability  was  lack ina. 

In  conducting  analysis  of  these  existing  measures  of  depart¬ 
mental  readmess  ana  performance  it  became  evident  that  there  are 
a  var let ,  of  factors  beyond  the  scope  of  the  department  heads’ 
control  that  can  moderate  results.  While  a  number  of  situational 
<e.a.,  experience  level  of  department  personnel,  CO  influence) 
factors  are  essentially  random  and  other  miscellaneous  factors 
<e.q.,  age  of  ship)  can  be  statistically  controlled  for  in 
analysis,  there  remain  a  plethora  of  systemic  factors  vhose 
influence  cannot  readily  be  identified.  For  example,  such 
factors  as  the  fleet’s  operations  ^empo,  changing  evaluation 
standards,  budgetary  limitations,  etc.  can  dramatically  affect 
readiness  assessments.  The  influence  of  these  systemic  factors 
undoubtedly  contributed  to  the  poor  stability  of  individual 
measures  and  lack  of  inter-measure  consistency. 

In  summary,  multiple  measures  of  departmental  readiness  and 
performance  e.aluated  in  the  present  study  should  not  be  used  to 
assess  the  revised  SWQ  career  oath.  In  light  of  the  poor 
stability  documented  for  existing  measures  of  ship  department 
readiness  arid  performance,  it  is  recommended  that  the  Ma,,' 
conduct  a  critical  review  of  these  measures  in  an  effort  to 
.  mo  to»e  their  reli  ab  1  1  i  t  ,  and  v  a  1  idi  t 
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k  ois.nu'  i  c  it  i  v  i  •  t  ir.1-;  Jiscu->>ed  e>o"  *  ■ .  -o|  1-s  ted  via  : -isk  difficulty 

b'oklet.  Iilfieut'  v  w  dtfii'ea  is  n  airmnn'  of  tme  need. •  I  to  ’earn  to  do 
-  t  i-.h  •:  1 1  l  s  f  ac  t  •' i  ■  R<>  ,po"den;  .  were  a  .’/•  .!  to  r  it.  *•  u  h  task  on  a 

’  n-  i  n  soil--  ,'i'oor  !ai  o  its  i  1  a  i  v  ditli'ulty,  eon  par  ed  to  the  other 


"T  h  i  r  4  ,  ;,s  i  :v 
cwmun!,'  u  ive  tasks 
an  .  . 0-p>mt  sail*' 


an  M'v-'iU'iv  w  i  h  the  same  1  <  adersh i  r> ,  management  ,  and 
discuss,*}  above,  respondent  s  uvre  asked  la  rale  each  task 
anc  >rd :  t'  its  need  in  Air  L'o  rc  e  educational  programs. 


2"  ) 


A  fourth 
l  o  o i c  s  f  r  am 


s*>i  of  data  was 
the  c  ur  r  i  c  u  1  i  o 


,> 


l epics,  res n  indents  were  asked  to 
skill  in,  eaci,  topic  was  necessary 


I  looted  via  a  survev  containing  a  list  of 
f fioer  PMF  courses.  For  each  of  these 
rate  the  exl<nl  to  which  knowledge  of,  or 
to  perform  their  present  job  (  need- in-  job  )  . 


I’s  i  ng 

til-' 

list  of  topics,  respondents  to  a  fi 

fth 

booklet  were 

asked  to  rate 

t  he 

extent  to  which  knowledge  of,  >r  skill 

i  n  , 

each  topic 

wa  s 

n  i  *  c  e  s  s  i  r  v  t  i 

func 

t  ion 

as  a  career  officer  (need-in-career 

)  . 

An  8-point 

sc  a  1  ^ 

was  as.-d  to  r 

ate 

the  t 

epics  in  both  of  these  surveys. 

The^e  Mjrv-’vs  were  validated  and  ipproved  by  represent  at -ves  of  the 
vari  >us  ?ME  schools.  Random  samples  were  selected  for  administration  of  the 
s i ■  r v  •  v s  between  June  ]983  and  April  1 Q 8 4 .  Representative  samples  across  the 
s u r v ■  v were  achiev  >d.~ 


RFSi’LTS 


The  analysis  of  the  task  performance  data  showed  a  pattern  of 
ins  ro.,smg  involvement  in  leadership,  management,  and  communicative  tasks  as 
officers  increased  in  rank  from  lieutenant  to  colonel.  Supporting  this 
pattern  were  data  that  showed  the  percentages  of  officers  who  had  supervisory 
responsibilities  increased  from  38  percent  among  lieutenants  to  93  percent 
among  colonels.  Add i t i ona 1 1 v  ,  the  percentage  of  total  job  time  spent  on  th» 
tasks  in  the  survey  increased  fron  56  percent  to  81  percent,  from  lieutenant 
to  colonel,  respectively.  Related  to  these  was  the  organisational  assignment 
pattern,  which  showed  the  manner  in  which  the  percentage  of  officers  assigned 
to  organizational  levels  as  rank  increased.  This  pattern  of  increasing 
involvement  is  not  surprising,  but  it  does  illustrate  the  changing  nature  of 
most  officers’  responsibilities.  Further,  it  provides  some  rationale  for  a 
continuing  multiphase-}  professional  development  program. 

The  data  showed  a  great  amount  of  diversity  in  the  tasks  performed 
across  career  fields  and  ranks. ^  At  the  same  time,  there  was  a  substantial 
amount  of  similarity  in  certain  tasks  performed,  particularly  in  some  tasks 
dealing  with  communication  skills  and  motivating  others.  Further,  the 
differences  between  ranks  in  relative  time  spent  on  tasks  in  each  dut v  were 
very  snn 1 i . 

The  analysis  of  the  education  tmphasis  data  revealed  i  low  reliability 
of  raters.  In  short,  insufficient  agreement  on  the  amount  of  education 
required  to  perform  tasks  ex’sted  among  oliicers  in  genera!  and  across  career 
fields  an!  ranks.  This  finding  reemphasized  the  diversity  of  opinions  and 
needs  concerning  MSAF  officer  PMF . 
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I'*-  r  e !  :  an  >  !  i  t  i  os  of  i  is<  iitfioul.v  da  t .)  ver->  quite  Mith.'*  l'i  f  t  v--- :  x 
r  --  c .  ...  ■  h  i  c  n  d. iiicu.lv  r.H.-'gs  •••on  the  total  -roup  ot  raters.  and 
1  w.t  ■  o' — r. -iic.it  ivo  task-;.  s:h  as  drafting  or  writing  r.*  I  at  i  ve 1  v  high 
level  looan'or.ts  .  -fficer  r>  f  f «  c  t  i  v-  nes  s  reports,  plans,  staff  papers,  and 
r-r-'-'tsi.  ftih-r  Hjghlv  rat  •• !  tasks  involve.'  skills  sack  as  determining 
rosoar. -s  ;  i  d- :  n  i  at  •- r  i  r-g  its.-  pltnarv  actions  to  civilians;  an!  ordering, 

:->■ r  s  i.  t.i  i  ng  ,  or  in'!  ierc  i  e.g  t  h  i  so  superior  in  rank  or  position.  rittv-eight 
:as-,s  r-v;v,-;  1  -w  diffi'i’tv  ratines.  M.mv  of  these,  also,  iea’.t  with 
v  'i —  .mealing  and  nctivam?,  h'it  were  ->f  ranch  lower-level  .activities,  such  as 
i  r  .i :  •  i  "  i-  'r  writing  sh.Tt  not-'  replies  an.i  reading  professional  publications, 
•fyr  lowlv  rat-'d  tasks  include  providin'  informal  feedback,  at  ten-line 
train::'.;  sessicr.s,  and  maintaining  appearance  standards. 

\n  a !  \  s <  of  officers’  self-perceived  need  of  various  PMF.  topics  in 
f'-'ir  ’ops  a:  d  i"  their  car-  --r«  snowed  -•  great  deal  of  diversity  within  most 
sf  the  career  fields  and  ranks.  In  spite  of  this  diversity,  it  was  possible 
to  create  a  rack-'  order  listing  of  topics  from  each  of  these  groups  displaying 
tne  relative  m  «d  of  these  topics  m  the  job  or  in  the  career.  In  general  , 
the  data  showed  those  PMF  topi  s  wnich  officers  believed  were  needed  in  their 
jobs  w-re  topics  they  also  needed  in  their  careers;  conversely,  they  generally 
felt  those  tonics  not  neede  1  in  their  job  were  not  needed  in  their  careers. 

->n I v  -inor  dif:*rences  were  seen  in  perceptions  across  these  lwo  studies. 

C  vTitnun  icat  io-  tonics  leg,  effective  listening,  active  writing,  logical 
thinking!  were  rated  as  most  needed  by  officers  in  their  jobs  and  careers; 
sciicr  officers  saw  the  needs  as  about  equal,  while  junior  officers  perceived 
t'""~  as  great-  r  in  their  job  than  over  their  career.  Topics  on  the  military 
environment,  national  sec-aril  ,  and  military  employment  generally  received  the 
i'w-'st  r-l'-tive  need  ratings,  both  in  terms  of  in  the  present  job  and  over  the 
career.  Rated  lowest  were  tonics  on  other  services  policies  and  doctrines, 
economic  theories  and  svst-ms,  and  foreign  relations.^ 

Background  data,  collected  in  all  five  surveys ,  were  used  to  assess, 
among  >  t  h  •  -  r  things,  officers'  job  satisfaction  and  perceptions  of  benefits 
from  PME.  ('if  tb*-  10,607  resp-ons's  to  the  3  parts  of  the  project,  10,177  were 
used  in  assessing  background  data  responses.  (The  difference  here  reflects 
the  elimination  of  duplicate  responses  of  officers  who  were  asked  to  complete 
more  than  one  kind  of  survey  booklet,  since  background  questions  across 
booklets  were  identical.) 

While  the  job  satisfaction  indicators  were  high,  perceptions  of 
b-  no  fits  from  !'S\F  PMF  w.r-- ,  at  best,  mixed.  Those  who  participated  in 
precommi  ss  l  on  l  ng  PM!-,  at  the  T '  S  A  ir  Academv,  indicated  the  highest  degree  of 
benefit,  while  the  extent  of  benefit  f rom  RnTC  and  OTS  PMF,  was  much  lower. 
Officers  indicated  verv  low  benefit  from  SOS  bv  correspondence ,  and  lower 
benefits  in  general  from  PMF,  bv  correspondence  or  seminar,  than  through 
residenc-  pro, -rams. 
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',’SF1  M-,  i  n  is 

A  ;  ivi :  .  r. :  F 

rvr  ..j  ...  f 

W  1 .  i  ;•  s„  -p  *  1  :'rp',.\!tv  ■*  PM.  t,dics  increase!  ov(-r  t  he  v*a  rs  , 

v-r:  ji;i  r  vjl;-,  ve-e  con>.i ->t  ent  .  ba'vs  s  >f  t  i^k  j  nvo  1  vement  over  time,  f  >r 

•  \  ,  s:  v  :  ,i  c  'n^i'toru  i  tac  r-  a  se  in  t  1-vel  and  mr-ber  of  lendersh  i  p , 

-  .«  (  -3-;  • -man i 0  at i ve  ;  i<ps  v  i t  -  ■>v*,rv  increase  in  rank.  Di  *  ferences 

:  task  pp  r  f  1  r~a”.v  •  icr'^-i  s  I  1  gal  j  on  f !  ►>  1  i  s  r  also  observed  across 

sji.iips;  per  at  1  s  per  S>  ir-e  1  ,  *  r  example,  were  c  'p  s  1  <•  ent  1  y  low.’r  performers 


;o~parej  t  >  otner  : 

:  el  it-  e  .  w  r 
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increasing  the  scope  of 

t  he  i  r 

1  e ad ■* r su  i  0  ,  an  av.-m> 

<n’  ,  and  —  :oi 

'll 

i  v** 

res  pons  i  S :  1  i  t  i  es  as  they 

1 nc  r eased 

. a  ran'-  t  -  t h-  "'r' 

f  -  Qua  1  i  t  -.  w. 

t  h 

ot  tw 

r  tie’,  fs  at  the  grade  of 

c  ->1  opel  . 

’ask  :  1 f  f 1 cu 1 

1 1  v  lata  teller 

1  ••  i 

i  n 

this  and  the  1  9S  A  studies 

)  general  le¬ 

agre.  f  'a  th-  tasks 

officers  cons :  1 

•>  r 

an  1  1  e.jv  t  J  i  f  *  1  cul  t  .  Me 

ading  tKe 

1  ’  ■- 1  v  w.  r-  drafting 

or  writing  hi,--. 

y 

l  ** 

v  •  * ! 

0  f  f  i  - 1  a  1  c  0  r  r  e  s  po  nd  ate, 

coniuc t ’ ng 

h  -  g  >  -  1  .  V  e  1  i  "V  e  s  t  1  g  - 

it  ions  .  an  1  -bn  > 

r,n.  i 

~  i  n*» 

res,i urges.  how  .'a  b  >th 

list s  w  r  e 

at  t  >>n  i  ipg  training  sessions,  ma  i  nt  a  i  n  1  r.g  personal  appearance  standards,  and 
fraftitv  r  writm;  low- level  correspondence. 

Tt  .  perc  pt.  ions  of  need  of  particular  c  nr  r  1  c  u!  urn  topics  were  analvzed 
i  1  t  t.-rent  .  is  this  study  than  t-at  data  10  the  two.  previous  surveys,  but  two 
s.milar  findings  were  eviient.  First,  there  was  a  great  amount  o f  diverc,tv 
within  sub-groups  analyzed  as  to  what  PMF  topics  officers  r  rceived  tiv-y 
nn,>dmi.  Second,  there  was  a  general  overall  consistency  as  to  what  were  the 
p">si  an  i  least  needed  topics  within  sub-groups.  While  previous  studies  used 
average  ratings  ini  this  one  used  a  rank  >rd-'r  method  to  assess  need, 
agreements  on  self-perceived  needs  across  time  were  striking-  topics  dealing 
with  iral  and  written  communication,  leadership,  and  principles  of  management 
were  r. 'ns j stent  1 v  viewed  as  among  the  most  needed,  while  those  .haling  with 
Ar-nv  and  N’avv  doctrine,  international  relations,  and  warfare  were  perceived  as 
1 e i q  t  nee  led  . 


t  >rv  ;  -1 
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:m  V.i’.i  i. it  ion  Protect  produced  K  . 
Air  Form-  Officer  Management 

;  ; ;  ,1  r •.  •  iuca;  :  ■>  1  Refit*  j  remenl  s 

.  ’  s’)  -r.-.r  orv  in  Oe.emSer  !  4  A  a 


A  conpirison  of  the  perception  of  benefits  from  the  various  methods  of 
1'MF  between  the  two  most  re  ent  studies  sh  iwed  few  differences.  Roth  tidies 
Sound  t  if  ptro'ptnn  of  benefits  from  SOS  by  correspondence  to  be  small,  frnp 
residence  programs  to  he  great  her  than  far  1  orrespondonce  or  seminar  programs, 
an  1  from  ’’SAP  Acad-mv  F’MF  to  fv-  of  greater  benefit  than  ROTO  or  OTF-OCS  PMF. 


1 


A  s-  rif'S  o'  r-'lit  •:  o  r-  * !  • "  r  pr.vr.rs,  cal  It  !  the  C  ir-nrehens  i  ve 
\c  .3ii  ioiuI  Dai  a  Anal. Progrars  'CODAP',  was  applied  i.o  tie  data  c 
a-^ist  in  1 h->  analvsis.  R^liaHiliiv  : f  indiviiua!  raters  was  .30  and 
re  li  ibility  of  rater--  as  a  »’ro;jp  was  .9^,  pot'n  oc  w>ich  were  well 
within  the  r  a  n  g  e  if  r-  1  i  ah  1  1  i  t  i  e  s  accepted  in  I'oAF  occupational 
r  •  a  . 

9.  Correlations  of  ranki-gs  it  need- in- job  bv  rank  ranged  front  a  hi.;!,  of 

.00  between  naiors  and  lieutenant  colonels  to  a  low  of  .91  between 
lieutenants  and  colon-  1  = .  Lowest  correlations  were  b  tween  rankings  oy 
d  i  r.-c  r-cenm  ss  i  >n  officers  and  aendemv  graduates  (.’’o)  and  between 
c  artoeraphy-g.-  idesy  intelligence  specialties  and  oth->r  utilization 
fields.  Correlations  of  rankings  if  need- i n-c are- r  bv  ranks, 
corrr-  i  ss  i  an  i  ng  source-,,  an  i  utilization  fields  were  slightly  lower,  but 
c  --parable. 
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>>em0  f  i  at  sr>P  and  Aprn.yd  f'-n'land' 
N.i^v  Pe'-senr-el  ft  escarp!-  am  Development 
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-  t  V'e*  .  :  ’  t  \  aud”  or ,  tree  and  m’'rorrrr.pnler  ger.^r  ated 

r  ,f  o'  ;•  fr.r t.cjt k>p  y i res >!ng  ,ver e  administered  to  '  oHege  student .  Haw 

'-  y.^J/ry'  jQf  lr  addltV';'- ,  eac1"-  tobies’  vs5i 2  y  of  los1s0f,'>tfjriK)  in  ii'iivifje 

•'Jepesoe' ;  *-> .3. a '•  J ' . jf ■*  ar.1  lef*  ‘‘seTt  ,-pfvr >■-.  fur.-ticning  Criterion  measures  included 
a  .*r  py '  v ;  s' .  yd  r  ow  pv  <  p  loci'  Re^gn'  measure  o’  niei  licence  Results  revealed  a 
■qer^ro!  p'  nee-  sing  speed  factor  in  addition  to  tasl  specific  source,  of  variability  Moro^er ,  the 
.  ■  my  .s-rd 1  .  loaded  or-  'he  -arre  seer  sd  or  jpr  fcsctcr  as  did  the  tradi'tona!  measures  of 

intelligence  and  aptitude  The  findings  support  the  tneoretica'  view  that  processing  speed  may  be 
a  dener  u  fact'll'  in  indt.idual  differences  in  performance  on  complex  intellectual  tasks  An 
’rr,;i  r‘rt  obi0f'i',e  or  future  wort  in  this  yea  i'  to  separate  and  evaluate  the  common  and 
1>k  d  '  .i-urre-nt  *a r  lability  on  processing  speed  tosfs,  such  as  those  used  in  this  study ,  which 
widao  life'  r  m,  rn'eHertual  content  and  involve  little  nr  r»  complex  problem  solving 


'  i'*>r»T  mt '  KgiJe'-t'  *  1 1 r e  I’s/rrioiugy  Clinic  sun  Diego  'Tate  Univer  cty.  San  Diego, 

<  al'torma  nj  t f  undinq  for  this  won-  was  provided  by  a  facuitv  fellowship  from  the 
Amei  n  ar,  'or  'e*y  tor  Engineering  f  duration  and  bv  T uhire  :e'  hno'ogtes  for  Manpower  and 
f  pr  --nnnei  f  Pf  r>„  m  t-ir.7,  sp 1  )‘  f  he  authors  wish  to  express  fher  appreciation  fo 

‘.tpvpn  1 1  Is »iil  ter  h,'-  assistance  in  the  .late.tr a!  analyse'- 

-  ^1  ‘V.  h  w*  1'  j 


*  T  tie  view  expr-  sed-ifhr  pope'  arp  those  of  Hi*-  aottior  are  nut  of  final  and  do  not 
riere'.sai  iM  '  rtli-t  fhevirw1  n’  the  Navy  Dep irtmpnt 


233 


.  >  >  -ju-  '  !  s;  >*.  -  -,r  IV.  pf-r  c  ipsjn  At  '  -/pgr*  yy  !p  prlir  '  irtfj 

be'  '  •'  f  ~-j  i"'  ’  .  *’^f  c  f  p-  rvef>  >  -vir  /  :.,r  :.ew  appo wJies  to  tne  measur emen*  c-( 
•  ’  •’  \  :"V”'.jpr, ».  j'p-,,  r,  •pp'vyh  '  ?fer  reij  hs  ar  i.hron'*metr  u" 

1  *  *- '  t''  '  ’ c>rr‘  p* '  *  7\  ’  I  °  '}r*  * V  *  '  '••  pp<x^  M  pf'jr  rvs  »pp  »-'r'YDCc  »',q  T  Kp  npl  ppf )  'j | 

‘ ;  f  -'  'r>  >  «rM-T.»-r:  r,  rrt:!;i0r  y  ;p!!, , -•-.wfvPr  <:  ur„. iear  Joe  to  a  Timned  data  baht 

~er  '  j\  yr‘  ,:-r.ge\ ’Jer.ce  *  hough  ‘‘^freponde'-aiirv of  Judies  have  revealed  3 
"■■r  b  j’  '  iCr'l!’  an*  rf  rfj|3t  >0n  pe'wrijn  pr  or  ex  mg  ^pred  k  -  ,y,tj  those  love!  -'’fig  complex 
■'  «"Prrj.  •  ”>  ~  ear.iojcd  iUf  o  'm  ‘mgs  ,1  j  matter  ;t  „orar  jversv  1  pee  ^urd  :  v>?C , 


n  '■  • r'  ‘vf.  AC-  M3'r*ed  r'Xa;  IU'V.1  ■'¥  Vd  react  mi  trme  measures  01  f  COfeisTiQ 
f** 1 "  ‘  -  p  t.-1;  Am  'j'>  e'p  '•'■Jd u  va'  r  !',f .  r*  .  ;pq 'T'!'r,xornp,j*pr  pont'cHfid 

■■•)►'''  °vr-jV^:r  ••rg  speed  3? '.'JOQp.^ec  p\  "r Pirr-iarc  andtrysiv  ■  i  >'7>  r  inaMv . 

■*■  " ! ‘-d fr  :►  ’...'f.Wri  :nl  \ m,e  mea'ung  rd  per  lor  manr*- on  a  pirncc-Xinq  spr-ed  fas! 

..  ; v  -  •  "  ' >-  •  V*  *0  “■n  'pn--  r.'f.q  ‘  ••>  - gM  yy  ;rf;  ,'pr^,r  ai  vm •-><• 

r‘n‘h0(1 

•  1  vi[MP-t.  •.(»  '.or-  ;,r.iv'-rv!v  ptorfRrt.,  (nirr,  or.  tntrodurtors  course  \r, 

:  ,<Ti  wr-  Dk'wprp  '  ?  3fid  r  J  yp^r '  0!  cKy-  •'<*,  were  rr.^ie,  fo^'y-f fires 

.P^a,p  [  --'y  wpre''airr:3r  '^om-rig  1  c.  wem  r'lgpJf  Mi'pamc,  andAf.ian  The  ramp  Ip 

vV  1  ‘  Opf  PVT.m  ’ ,  f*  '!  tTiP  *rtt  111  TPi  iIjIa!  i»^n  rtf  ^<ri  <Ut(\artlc  ir.  !Ko  IP.')')  a<v.  ronrtA 

-  -  -  '•  '  -*  *  ^  ‘  ■>  J'  /  if  MV  M.«  L/  •  /VJU  »  'J1 

r y'1  ‘'it-' of  fr^tiYinn  a  s ef  visual ,  audi'orv  ano reaction  lime  procnssinospert  fast-? 

jnj  wrrn  31jj  guer  the  mXdDulftr /  jr-.ij  f.'iuu  L’Oiign  -..jt.itn.i-  of  Un'-  //ec.hsier  Acfuif 
rtU;''igf'Xt  A; a;n,  Kt.  i-.ed  1  W.AiC-P )  and  the (‘•'jgr.itue Laterality  Battery 

Tvv‘i'yp»v  nf  v:-ja!  tac.»  s  .\ere 'jped,  with  ihpprdef  of  pi'p^entation  alter  net  mo  fietween 
uh,fVi-,  Tho;-r  ir.e  *>r  "A  type  Tivolved  tac.hrtxcopi’  prevhtations  in  a  paradigm  ;h.i»  tol lowed 
‘h--  tH'-nr  ■>'  ;» efur  o'  ;-prj  pv  c~rr'j'r  e*  a’  <  'A 79 a  microcomputer  presented  battery  ot  five 

,f.  „,r  y.r  n;  pr  onv-Airg 'peed  r  desri-iheo  b\  tarpon  and  fnmlandt  tAAdi, 

prre,  inert  tm-  -ernnd  :.p*  ex pnr  1  mental  task  ■  t arn  mirrovirriputer  t.--y,t  was  presented  twice  to 
.uhie.-’  ■  .iao  TT‘C- BOar-donce  v’laar  A,-- ,  •  ••  .  /o.-hdyyedde'ign  Auditor/ 

o'  1  r,i  i-v  <  mg  was  fpegsured  hv  the  Pep*'*  ‘i.<  h-i.o  l.-j'lol  jn<)  tMr-r^y  <  f  Q  7  "3;  >  Aieai'lion 

Mrrie  tac.T  a-,  1e /rihed  by  .arson  and  Am.i-nd  ‘  Aj-T'.  wo',  presented  oti  the  Tf;L -80  computer 
it'd '  ‘-yb  vjr  d  r  *  r i  cufgert  nie-d'an  i  -<!, !  inn  time  { based  of:  1  ’  tr  ialc  T  wm  used  ac.  the  index  for 
| he  on*.  A , :  i  tnrt-f  -  AT  / 1,  and  tivc-  •  A T 5 ,  pr  ,,ce  ta-'f  The  Cognitive  I  aterahty  Battery 
1  i 1  1  hv  Vir-ion  .  i  /-  i  toK/aiuate  individual  di!ferences  in  hemispheric 

h'-’  'Tirnc,r  if-  vv-r'  admt'  > -tr-r  t*o  m  •  upici  <s  m  small  Qrnups  i  r  - 1  e,  sub(ects)  alter  ell  proressinq 
‘y  t  '  h  ail  pet  n  .m  didPi- 


Mt;SuHc. 

'afde  '  p'M ,i  •  t/r  re'ij't  i- a ' ,  hmid“i  eiroaf,  i  terari  fin.i' hactrir  analyfr.  m  whin  ail 
,,v  'hi"  ar>- '‘r'fcHyna"  v-g  y  d  urst- ociier  facto?  ■  are  re<  iduahrerl  <  le  ,  their  correlated  variance 
afi-/i'’  fil'd  i  r  -  *  r  ihr  ..pi-or.g  -  or  i>r  qener  a!  far  t  u?  >  a  ,  the  anaty  es  chows,  (tie  prorersirin  ^prc'd 
’  >sl  ind  ‘hr-  r  n«r  mo  vy  :af)]0'  inaded  tnqether  nr,  ^  srrori1  order  laid  nr ,  which  was  labeled 
zener  j|  T  i*-ut  ai  >fie.if  i  our  1 1  r  d  or  tier  tar  tor  fv.-n  t'ori  Time  .Auditory  ApeeiJ,  Psyr  hornet  i  ir 
inteihgei 1  e,  ;nd  Vi  .ua’  ‘  piaid.  a'-n  ‘'moriyg 
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'  r,'j!>pr,t  factor  loading*  underlined  <  or  relations  in  the  original  matrix  were  reflected  so  that  good 
per  formant*  ha:,  oeen  positively  corr elated  with  all  other  var  tables 
A'  'L>' ''  undAHD’  1  -  Auditory  Processing  Speed  lor  long  arid  shur  t  in'erstimulus  inter  vals 
r^spe  lively  isp  i  and  'Sir,  2  fhe  Critical  Inter  stimulus.  Interval,  trials  one  and  two. 
respectively  i  a  ‘achistnscxipieally  determined  nonverbal  measure  of  speed  of  information 
processing;  IRAV  andAPAV  -  IPS  80  and  Apple  1 1  microcomputer  derived  scores  for  speed  of 
vi >ual  processing  PT1  PI2,  and  RI S  One.  I hree.  and  Five chcce  reaction  times  HSOPAand 
FPGPA  -  High  school  and  Freshman  Grade  Point  Average  SAT  -  'Scholastic  Aptitude  Test  ( tota' 
xore)  VOCAP  and  pr>  sr ores  on  the  vocabulary  and  Plorf  Design  Subtests  of  the  Wechsler  Adult 
intelligern  e  Scale,  Revised 
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*  nr  i  eiatinns  hdve  been  r  effected 

*  p  ns 

“  U  l,: 

*  ♦  f after  l, ladings  fur  firs!  unrntated  factor  matrix  lor  Principal  Components  factor  analysis 

i  total  var  tanrr-  annuoled  for  equals  25  A  per  root )  .Salient  loadings  are  under  lined 
right  and  Left  hemisphere  turn  tionmg  was  measured  by  the  Gordon  Cognitive  Laterality  Battery 
AUDI  0  and  AUPGT  -  Auditory  Processing  Speed  tor  long  and  short  interstimulus  intervals, 
respe>  lively  r.lcl  andlSlc?  =  TheCrit"'a1  Interstimulus  interval ,  trials  one  and  two, 
r  espa  lively  ia  tarhistrn  opmally  determined  oonverhal  measure  of  speed  o.  inlormation 
P  recessing, '  I PAV  and  APAV  -  TRS-cO  and  Apple- II  microcomputer  derived  scores  tor  speed  of 
.  r.u.ai  pr  -« i-  .-.mg  PI  !  Pip,  arid  PT 7,  -  One,  three,  and  tive choice  reaction  tunes 
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■  live.  '•<  ”'t  >  •  r  r  'if«j ;  r  ■  i,r  ■',d,eo  factor  ior  'he  measure-,  oi  processing 

"ij  'o.;r  ;.,r  - jh;  Vm’-p^r1'  in-' 'if,  mg  h .emuiatler  nl  the  table reveal"  a 

»''•  ;-3**r>-r  ■><  -  * 't  r ,y  inp-  ’hu  pr  -peed  meac  jr??  and  the  measures  cf 

sen  i-i'Mi’  imnu.d  ;.pn.  itisattv  judder.  processing  was  related  more  -tronglv  Jo  the  left 
he'r  >  .(.V'  w  ,  , ;  •  «  h-  -cog  tn  'he  r  igpf 

L; '  '.**  ut  >.ir 

a  ^  ..t  ji  tfi ' poed  tcist  •>  loaded  on  fie  surne  second  m  der  factor ,  derived  from  a 
-  •  i  -Tii/:'  *  'od  i  -I-*  oi  med'i/p;  C  intelligence  and  aptitude  f  hrs  finding 

:  -  **■  'x  r^-jr*  -  ■_ ! -rr pie'  i  oqr>:'tvp  13  t  '  ir  n  j.  B  Vc1-  Design  and  the  T«AT  is  related  to 

jvr'  r  mm.>  ,.n  V.r  •  If  i!  hjve  MMle  x  nr  I  no*  V-dqe  r;  intent  and  require  no  complex  problem 
:r.iVd«  ’  *e  '’'’dTiQ'  'hu  -upper '  f,,c- '’■en'-e' few  that  processing  speed  may  be  a 

^  1  1  ;  . . _*»,'*•  jl  <>Mw>r  rr.  n,sr  (f.r  rr.^r-r,..  ,-,r,  -OfTlpi-K  i  rtt  r  I  IK  1  U'll  1 6S'k  S 

’’■r  Jo'o  '’/'her  -p-,ea  'wo  opponent-,  to  the  .  'jrj&’iity  i"  measures  of  speed  of 

"p  jqeror.ji  c  mpor>ert  that  ar.' mints  f;r  r,,ijgi|i\.  half 'he  varvjnce  and  tast  spa  if  ic 
•  omponenR  that  a.count  fvr  the  other  halt  These  tas*  specific  components  account  for  a  fairly 
•  i [  _ t a*  ‘ ' a  ;r.y-  ‘  ‘v  ,  iruat  '’*>  and  came'  be  ignored  when  considering  the  relationship 
be'w* or  n p^.irp--  oi  processing  --,pp,i  iin(j  rnp3ciiro?  'nvri'vinq  complex  cogmt ive  processing  The 
tcU  .-iiecifi.  vat  tjnee,  mireover  may  he  of  intrinsic  interest  m  that  any  given  task  rnary'  be  related 
’r  impnr  font  pr  'reuses  1  r  ins* once,  the  pattern  of  correlations  revPQled  that  the  visual  tasks 
tenoeG  k>  be r  -doted  t(J  the  i  ight  hemisphere  related  tasks  on  the  dor  den  Battery,  the  auditory 
tad  by  urdra  .1 ,  were  related  tn  the  Gorilon  Battery  left  hemisphere  tasks  Hypothetically ,  it 
might  be  fe-v  ip’e  m  i  onstr  net  a  >el  of  pr  nressmu  speed  last  s  both  to  measure  general  Intel I ’.genre, 
through  obtaining  a » ompuede  score  in  which  individual  differences  due  to  task-  specific  abilities 
average  out ,  am!  v  measure  mor  e  .per  if  t  group  factors  that  are  relatively  independent  of  each 
niher  and  ran  hplp  prechct  mb  performance  Fur  ture  research  in  this  area  should  be  directed 
towai  d  -epar  ating  the  common  var  lance  from  Ute  last  specific  and  to  determine  more  precisely 
what  each  if  these  rnmpnnerts  measui  es  We  conclude  that 

■  i  ’  hrut-'  yinq  ,  pp<'d  lasts,  which  contain  little  nr  no  intellectual  content  and  involve  little  or  nn 
nmfitex  re  otilem  selvirn]  srills,  shore  <  ommon  vai  lann-  with  conventional  psychometric  tests  that 

lift  involve i  amp ie>  r  ea-'oienq  and  prnkdem  solving  4  iii 

■  i  .pores  on  prnre,‘''infi  speer)  tasks  are  multi-dimrnsinnal  fhouqh  a  qeriPral  processing  spepd 
fat  ter  emerges  from  a  hierarchical  analysis ol  a  set  of  visual ,  auditory,  and  reaction  time  tasks, 
m.tn-  ppi  oil  fy  'or «  emerge  as  well  Consequently,  tt  appears  to  he  an  over slmpl if icatlon  to  ask 
whether  any  part  mular  proresmng  spr-ed  'ask  is  related  tn  intelligence  Rather ,  more  specific 
■Vie s' inns  ar  e  nts-ilcr! 

c  * ,  r  iea  ,urr-ment  m  •  pear  man  s  "g '  factor  of  mental  atnlitv  by  means  ot  speed  of  processmq  tasks 
w'H  must  !n  ply  depend  nn  u  ,ing  a  battery  n?  so.  h  task  s  having  sufficient  diversity  m  specific  fast 
ifMt'ir«*s  tn  perm'  'hr’  averaqinqnu'  of 'ask  per’fir  vorianre  so  that  the  composite  score  wilt 
fir'jnmiriafr-lv  rerier'  * h,-  general  atuli'v  factor  that  isi,nmmrin  to  all  nt  thr-  tast ; 
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I.  A:r.'i::ir  is  Fu^damer tal  ir'  Science 

Abstracting  O’  modellirg  o*  D“erome',a  is  a  an'T'ar*  method  of  science  for  increasing 
■v-ow  ledge.  This  accroarK  is  vital  where  the  phe-.cmenp-  under  investigation  cannot  be 
subjected  ’c  direct  scientific  manipulation,  such  as  astronomical  processes  or  force- on-force 
comoat.  Kc.ve.er,the  quality  of  conclusions  Obtained  from  a  particular  abstraction,  or 
model,  is  c'iticailv  dependent  jpc  hp*w  completely  and  accurately  the  essential  components 
aro  them  I'temelatior.ships  are  represented. 

Models  ^.h^h  simulate  the  process  of  comba*  naye  long  been  important  to  military 
planners  in  the  fprmuiaticr  of  pohc-,  strategy  anp  tactics  'Brewer  anc  Shubik,  r5??).  The 
use  :♦  modern  computers  has  enabled  the  representation,  of  an  unprecedented  level  of  detail 
i-  cpr-oat  -odelli'd.  Fo'  e  amole.  e  ta-t  Army  combat  simulation  models  consist  of 
Ku_dreds  of  thousands  o;  .I'-es  of  high-level-language  computer  code.  This  code  represents 
the  co~o process  i"  e  feme  detail,  down  tc  'Me  field  of  vision  and  magnification 
speci*ica*:c's  c+  the  specific  optics  selected  t*  every  relevant  weapon  s/stem  at  a  given 
port  ir  time. 

Despite  sucK  g-eat  detail  incorporated  r  man.  models,  larger  questions  remain  about  the 
overall  adequacy  pf  me  representation  of  the  combat  process.  Characteristics  of  optical  or 
weapon  s  stems  can  be  precisely  determined  using  e  penmental  methodologies.  Data  from 
such  methods  serve  to  determine  with  high  confidence  the  valuec  of  many  of  the  very  large 
'umper  pf  parameters  present  in  modern  combat  simulation  models.  One  class  of  phenomena 
which  is  rct.  hCwever,  so  easily  characterized  is  r,unar,  behavior.  As  human  performance 
factors  HFF'  are  an  integral  part  of  virtually  all  s  stem, 5  corr.pr ising  the  combat  process, 
the  nature  pf  the  representation  of  these  facte- s  k-  combat  models  has  direct  implications 
for  the  validity  oi  anv  results. 

II.  Homa-  Fe-fcrmance  Factors  in  Models 

T-e'e  are  se.era.  issues  pertaining  to  fhe  recfesentation  of  HFF  in  the  modelling  of 
combat.  Spr-e  pf  t-ese  ca-  be  illust'atec  b  taking  a  simple  behavior  sue*-  as  the  hkelihood 
fhat  a  S'-gle  soldie'  will  de+ect  a  carticular  type  of  vehicle  under  some  Situation  specifying 
distarce.  atmospheric  conditions,  lighting,  the  soldier  's  physical  and  psychological  states, 
and  so  forth. 

One  issue  conce  ns  the  accuracy  c«f  estimating  the  likelihood  of  detection.  With 
cor ventic-al  e  oerimentatior  and  statistical  methodology,  an  estimate  of  known  certainty 
car  te  obtained.  Similar  methods  car.  yield  information  on  the  variability  between  individual 
soldie's  r  detection  likelihood.  However,  a  serious  difficulty  arises  with  this  approach 
aC?"  -he  detection  parameters  are  sought  for  di^e-ent  sets  of  situation  contexts.  Human 
P5'fp'ma-;e  can  varv  rot  onl .  as  a  function  c+  each  of  the  m>  'iad  0+  single  factor  s  but  also 
based  -per  dr.y  possible  ir te'actior  :  +  *he  factors.  Both  combinatorial  explosion  and 
e  oe  ime*  ta:  imprdcticalif .  become  propiem.s  immediate!  • .  Thus,  to  rigorously  establish 
model  oa'd^c-ters  c+  relevant  conte  ts  fc-r  HFF  c*  ever  a  simple  task  is  Simply  not  possible. 
The1  e+C'e  i‘  HPF  mus*  be  represented  m  combat  simulator  rr,odels  m  simplified  form,  and 
2  at  - : ~ e  point  assumptions  about  such  aspects  as  tre  value  limits  on  HPF  variables  and 
J’.e  r  a  -  -e  cf  interactions  among  variab.es  become  recessar-,. 

Wher  e  am,ples  of  reoresentatiors  0*  HPF  ir.  con  cat  simulatior.  rr.odels  ar e  considered, 
ber  the  -y  ano  defail  are  found +c  vary  widsD.  M.lie.  and  Bonder  (p:? 2  conducted  an 
invest!  m-iot  pf  the  HPF  treatment  ir,  r,me  ccmba*  simulation  models.  Fifteen  combat 
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p'ocesses  e.g.  ;d~~-'::a*:p's.  ~a' e„  .  e'  a'd  ::-r  te'^ooilit  -  and 

j.'  - '-  ■  a-'..'.*  •  '•*’£  .de'*:*ied  cd'tamed  1 1 T  's-^dr  facto'  P'dress  dKer  ci^er  a"  e.g. 

dec.sio's  4d  'ed~est  -me  s_ccd't.  'Cute  selectid'  u»  'd'ces  mommc  zr  the  g'Ponc,  increase  m 
e  :c?w'e  d_e  tr  e' C0ur  te'/q  a  ~:'e*.e.d  c  obstacle  s  .  T-e  *  cose' . ed  that  a  phenomenon 
::-.d  re  'eo'ese"ed  m  a  model  m  o'e  s*  -i.e  basic  'C'~s  1  assummg  that  the  ohenomenon 

is  w'  ^ar  r:  ir;  i>a  s  ;o"ectl,  oe"crmed.  Z  'eo'eser-tm-g  the  factor  b1  a '■•uman  plaver  in 
eal  t:~e  *~:le  "e  ~ooel  is  '-c-jr-,  .3  cmect.'-  irputrg  the  effect  r*  the  factor  onor  to 
"ere',  e  ec_*ir'.  -  -ermese' tm  c  tKe  r,'erc^eror  e  olicitlv  -■  compute'  code  within  the 


r  . 


~rde..  d"d 
'ecessit--  o'  some  s_rsj'ri'G 
"edeis.  ve  "_""re"  0*  e  plici 
o"erc"e'd  'a'ceo  *'pm  14  *0 
t  rice 


P'ese"mc  *he  01  ercme'or  i^Dlicitl-- .  that  is  inclusion  onlv  as  a  logical 
Id'cer-scale  o'oeess  c'esert  .  the  mddel.  ATcng  the  nine 
2'  u=p'  .'o_'  r eor eser  tatio' s  0*  '.oar  -facto'  process 
Ld'qe-scale  models,  sue-  as  theater-level 


!  simulations 

are  d-oec  -0 or  the  'esults  of  lower-level  models.  ne:essarii»  implicitly 
ti’d  A'ate.e'  'eo'esertaticns  of  a'd  assumptions  about  HP?  are  contained  in  the 


node. 


v*» 


r  t  fK 


e-  a-e  rasec. 


A  seed'd  issue  'eoa'di'-g  the  'eorese'tatior  of  HFF  in  combat  simwlation  models  centers 
joon  t'-e  e  tert  to  A-ich  t'e  assumptions  made  are  empi'icallv  based.  While  detailed 
el-cidd'ic"  d-  e  oe'ir'er',at:o'  is  ge*e'all  impossible.  other  sources  of  information  evist, 


p;'* 


'  tK 


:_e  s'd't:om;-gs.  For  e.  ample.  Tupuv  1 and  others  "a.e  proposed  greater 
jse  of  ‘■is’o.'.cal  data.  Z*  co.'se.  an  nls  c'.cal  account  is  to  some  e  tent  both  subjective 
a'd  rig''!-  deoe'ce"  „oo'  specific  ci'Cjms4ances.  Another  source  of  information  is  the 


rje'  4c  :*■  t*-  p 


am  m  question  Sue*--  persc'-s  ma-  induce  combat  veterans 
a'd  teha.io'al  scie'tists.  F'chle^s  a'e  "at  e  oerts  f'equert.*-  disagree  and  even 
ac'-ie .e^e't  of  co'-se'SuS  coes  not  guarantee  accuracy.  Anothe*  source  of  relevant 
mfo'"-ati:n  is  t'e  da'a  coliecteb  f'pm.  fe  bat+ahon-sne  training  e  -e'cises  conducted  at  the 


National  T'ar 


le'-'e'  NTC  m  rece' 


ms  'Fobes.  i??4..  This  data,  however,  is 


oe'tment  to  or>l .  a  subset  c*'  ‘he  HFF  of  m-terest  and  the  anal-sis  of  oata  does  present  some 
Challenges  Wwitmarsh.  1  :-r  . 

>  fmal  issue  cc'ce'':'-.  HPF  reo'eser,tatior.  is  establishing  their  'elative  importance  in 
de'e'^mind  "odc-l  o_t:om-es.  Perhaps  it  is  ‘zz  obvious  to  state  that  the  KPF  which  most 
;~oac*  outcomes  oe"aro  g'eater  a*ten+:on  m  terms  C'  bo4h  the  accuracy  and  completeness  of 
them  rep'ese" *ar.c"  m  compd*  models. 


I ’I.  Fep'eser-ta'ior  pf  Ta'ost  Selection  Behanor 

Thp  Tram:  g  Fer for Tar ce  A'alvsis  Toe!  -TPAT1  was  cevelopec  as  a  part  of  a  larger 
'••-sea':'  o-cd's-  Far'.  =  .  i5:r  to  a'sist  m  the  analysis  and  interpretation  of  data  contained 
in  "e  NTZ  database.  As  t'aming  resources  are  alwa  s  constrained,  t'ev  require  allocation 
or  +,-e  oasis  of  e  pec4ed  'e4u'n.  The  cuf'ert  /.ork  e  amines  how  HPF  a'e  modelled  in  TPAT 
with  the  'u  cose  pf  ide'tif-mg  *ke'e  training  inter  vent  ion  is  most  hkelv  to  have  the 
o'ea'est  impact.  T  -  oicall  > .  computer  -  based  combat  simulation  models  are  of  sufficient 
m-re'en4  cor  ole  m  tr-a4  dete'mmat.on  of  effects  of  variations  in  a  particular  component 
:a'r:4  be  assessed  e  a^matior  .  Instead,  "he  model,  with  changes  inco*porated.  must  be 
e  e:me:  a'd  t'e  'esu.ts  e.aluated.  The  scope  o4  ‘t  e  current  work  is  limited  to  the  HPF  area 
:♦  4a'daT  sele:4:o' .  .•mic"  is  'elatmelv  'ichW  'eo'eser.ted  m  TPAT. 

Tu'  des.g'  phNosc-h  c4  T:«T  -Weave-  a'd  "m lesem.er ,  Jr,  p'ess  ^.as  based  upon  three 
o-  .'-moles,  r  m  s4.  oar  ars'es  :r  4--e  p  og'am  a-e  o- 1  ma'  is  v  based  O'-  empirical  results 
obtamed  "o"  -eai-r:me  casual!  assessmen*  -RTCA  e  periments  'Zla'k  et  alt  1974) 
conducted  b'.  -he  Zc^bat  e.elcpmertsE  pemm-ertalip'  Command  'Cl  EC).  Second,  the  level 
:*'ce*a:l  m  *' f  sim^.atio'  is  r  est'icted  to  cnl .  wha+  is  necessa'  •  *:  accurately  simulate 
outcdTPs  a'-c  *'£  -  „-*ccm,es  a'e  lim, ;tpc  tp  4ncse  both  ohscr vat  ie  ard  'ecoroed  m  the  NTC 
database,  Crd.  Mo'  ■  'a'lo  simula'icn  techr.jojes  a'e  emplc/ed  to  enable  the  study  of 
c-u4ccme  a-iaci:it  .  TPAT  1*.  simp.e  e' ouqh  4c  4acilitate  understanding  and  manipulation  of 
HFF  w  .the  j*  d_e  ‘m?  a'd  e  pcn=p.  Fu'  *  -e' .  it  ma-.  be  possible  to  e  -entually  validate  the 
♦mdi'-os  q*  thm  ir  ,e  ‘iQaNo'  usi'.g  the  NTZ  database. 


As  imp’.  e  me  r  t  e  d  .  TPAT  i  -i  c  o  f  do- a  te  s  free  be'avic'al  'e'ce'cies  of  defending 
♦crces--fne'dlv  fcrces  jr.  the  Current  scenario — »>.’'ick'  bets  conflict  with  Armv  doctrine  and 
*ur  counter  to  co^^or  se'se.  7”ev  are.  howp.em  empi-icallv  cased  "lla'n  et  al.  1^74). 
First.  det'erdi'G  tarks  ard  artitank  tube-laureled  co*i:all> -tr  a:K.e:  *  if  e-gui*»d  missiles 
TOWS'  lack  ca'get  prefe'en-ces  concerning  target  vehicle  t-ces.  Zocrme  states  that  higher 
threat  ta'gets  should  be  engaged  first.  Ir  the  scenario  this  inches  that  the  long-range 
opposing  fcce  antitank  weapons  'SAGGERS  ana  tanks  K7ANKS  should  be  p'eferred  to 
unarmed  a'mcred  oe'scnel  ca'riers  '-ARCS'  as  ta'gets.  le^ende's  presumed  interest  jn 
sel*-preser  nation  would  also  suggest  that  the  ARCS,  at  least  at  moderate  to  long  ranges. 
sKculd  be  targets  0(  secondary  imoortan,ce  because  the-  are  of  relatively  little  threat. 

A  seep'd  emoir icall -'-based  behavioral  tendency  concer'ing  target  selection  incorporated 
rto  TPAT  is  tKat  r-ew iv  ,,sible  targets  are  preferred  targets.  Since  '-ew  targets  would,  in 
ge'e'al.  be  less  likel-  to  ha/e  acquired  a  target,  be  no-e  distant,  and.  therefore,  be  less 
threatening  than  previously  vis.ble  targe’s.  the  same  doc*rina!  principle  applies  and  again  is 
.’.plated.  7he  ceha.ior  also  appears  not  to  be  m  the  best  ir^c-rests  of  self-survival. 

Finally,  7RA7  r eludes  the  behavioral  ter-denc.  of  defending  tanks  to  prefer  previpuslv 
engaged  ta'gets.  T^is  preference  leads  to  previously  killed  targets  being  frequently 
engaged.  This  target  perseveratior.  also  violates  t^e  doctrinal  p'irciple  pf  firing  at 
nignest-threat  targets  Vst.  Also,  as  with  the  other  behavior  tendencies,  it  promotes 
'■eifer  Ihe  twfv..al  irte'ests  of  the  individual  place's  of  the  defenders  collectively. 

These  th-ee  target  selection  phenomena  are  induced  ir  TPAT  because  they  describe 
actual  behavior  cf  defenders  in  RTCA  e-  per lments .  Whether  the  phenomena  exist  in  NTC 
faming  e>ercises  must  await  the  collection  anci  analysis  of  approp'iate  data.  Rather  than 
spec-late  abcui  possible  osvchc!pg!cal  bases  of  these  behaviors  (e.a.  information  overload 
'-Miller,  lV5e)  »  prefe'ence  for  novelty,  need  for  closure,  "shooting  gallery  mentality,"  and  so 
♦c'th1,  it  is  more  important  to  understand  that  they  are  empirically  based  and  to  realize  that 
each  cculd  potentially  be  modified  to  some  degree  through  training.  TPAT  itself  can  be  used 
to  provide  estimates  of  the  effects  alterations  in  these  three  target  selection  behaviors 
have  upon  simulated  battle  outcomes.  The  results  should  reveal  their  relative  importance  on 
outcomes  and.  therefore,  suggest  where  the  largest  training  payoff  lies. 

IV.  Modification  of  Modelled  Target  Selection  Behaviors 

TPAT  was  run  employing  i  digitized  portion  of  NTC  terrair  with  an  approximate 
battalion-sized  force  -1?  tanks  and  1?  TOWS)  in  defense  and  an  approximate  regiment-size 
attacking  force  30  KTANKS,  30  SAGGERS,  ano  30  AFCS  /  using  rapid  apfoach  tactics.  To 
assess  the  effects  of  the  three  behavioral  assumptions  upon  model  output,  each  of  the 
behaviors  was  modiued.  The  preference  for  rewly  visible  targets  and  target  perseveration 
where  both  reduced  bv  100  percent.  Preference  for  target  type  was  manipulated  by  reducing 
desirability  of  APCS  to  ze'o  when  there  are  other  target  candidates  and  when  the  range  of 
APCS  was  greater  than  f00  mete's.  Target  selection  assumptions  were  modified  for 
defending  tanks  and  TOWS. 

The  effects  of  modifying  each  of  the  three  target  selection  tendencies  of  the  defending 
fo-ce  are  summarized  m  Table  1.  The  model  scenario  was  executed  fifty  times  for  each 
target  selection  modification.  Casualty  ratios  are  shewn  for  both  the  offensive  and 
defensive  forces  broken  down  into  inflicted  and  sustained  components.  The  values  for 
asualties  car  be  interpreted  as  the  mean  number  of  vehicle  kills  per  vehicle  per  trial.  The 
casualties  sustained  values  can  be  interpreted  as  the  mean  probability  across  trials  and 
across  vehicle  ♦  pes  of  a  vehicle  beirg  killed. 

The  results  in  Table  1  indicate  that  eliminating  either  the  perseveration  on  last  target  or 
the  preference  for  newly-visible  targets  has  virtually  no  effect  nr-  casualty  values  m 
comparison  to  outcomes  from,  the  original  model.  However,  the  conclusion  is  not  warranted 
based  upon  *he  the  evidence  at  hand  ths  +  they  have  no  effect  upon  per  for  rr.arice  outiome. 
Using  ot'.pr  scerariosor  alternative  terrair  may  reveal  they  too  affect  performance.  At 
present,  it  is  appropriate  to  conclude  onl  ■  that  these  beh*  .ors  ha  .e  r  o  impact  withr  the 
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current  scenario  corte,  t.  E1  ■  contrast,  reducing  the  preference  for  APCS  as  targets  produces 
strikingly  ci^ferert  per fc”T>ar.ce.  Because  defenders  concentrate  firepower  pr  KTANKS  and 
SAGGERS,  these  high-threat  attackers  are  atf  ited  at  a  relatively  higher  rate.  This  leads  to 
a  reduction  in  the  casualties  the  attackers  are  able  to  inflict.  Jr-  turn,  this  produces  the 
result  that  the  likelihood  of  defender  being  killed  falls  from  .50  to  .34.  The  values 
represent  a  decrease  ir  per  vehicle  risk  of  approximately  a  third. 

The  finding  that  reducing  the  attractiveness  for  APCS  as  targets  produces  fewer 
defender  losses  provides  unambiguous  support  for  the  doctrinal  principle  that  targets  of 
greater  threat  should  be  encaged  first.  Also  of  interest  is  the  supposition  that  target 
selection  behavior  is  probably  amenable  to  modification  through  appropriately  designed  and 
conducted  training.  While  target  discrimir-abihty  probably  has  inherent  limits,  especially  at 
e  :treme  range,  it  is  plausible  to  expect  some  improvement  with  training. 

A  series  of  TPAT  runs  was  conducted  to  explore  how  much  effect  upon  outcomes  might  k  3 
expected  from  varying  degrees  of  reduction  in  the  preference  for  unarmed  APCS  as  targets. 
In  addition  to  the  preference  level  of  APCS  in  the  original  model  and  the  previous  100 
percent  'eduction,  their  attractiveness  was  'educed  by  25,  50  and  75  percent.  Fifty  runs 
under  each  condition  were  performed.  Figure  1  presents  the  results  displayed  as  mean 
probabilities  of  defensive  vehicles  becoming  casualties.  The  specific  values  are  less 
meaningful  than  the  overall  pattern  of  the  relationship.  Finding  a  nonlinear  relatiorship 
underscores  the  need  for  actually  making  model  changes  and  executing  the  mooei  to 
determine  effects.  Ir  addition,  the  direction  of  the  nonlinearity  observed  suggests  that 
small  amounts  of  reduction  in  the  preference  of  APCS  as  targets  yield  little  reduction  in 
risk — a  "’5  percent  reduction  from  the  original  model  produces  no  risk  reduction.  The 
relationship  suggests  that  only  when  there  is  ^0  percent  or  greater  reduction  in  preference 
are  substantial  benefits  realned.  This  fincing  suggests  that  while  relevant  training  may 
reduce  risks  to  defenders,  incomplete  or  ineffective  training  may  produce  no  benefit. 


V.  Conclusions 

Clearly,  assumptions  made  about  HPF  can  affect  the  outcome  o*  simulated  combat.  This 
finding  has  training  as  well  as  analysis  implications.  Investigation  ox  HPF  effects  in  combat 
models  car,  be  important  to  the  identification  o+  training  needs,  to  the  planning  of  tr aimng, 
and  to  the  development  of  training  criteria  goals.  Mode!  developers  may  be  aware  of  the 
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Figure  1.  Effect  of  percent  reduction  la  defender  target  preference  for  APCs 
upon  oean  probability  of  a  defending  vehicle  becoaing  a  casualty. 

HPF  assumptions  and  simplifications  that  have  been  made  and  the  consequential  limitations 
upon  model  outcomes.  It  is  also  vitally  important  for  policy  decision  makers,  the  model 
users,  to  know  of  these  assumptions  and  caveats.  Therefore,  the  value  of  models  is 
enhanced  when  these  considerations  are  explicitly  documented. 

Further  systematic  research  on  the  HPF  representation  in  combat  simulation  models  is 
necessary.  Several  o*  the  specific  needs  are  (1)  development  of  str ateqies  for  gathering 
relevar.t  empirical  data.  O  determination  of  most  important  HPF.  and  (3)  identification  of 
methods  for  better  incorporating  the  effects  of  HPF  into  combat  simulation  model  findings. 
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! at ro  me t  Ion 

0-  (><.'  1  a.v,  on  wsetuet  a  country  cnoose  ,  can  its  arine  i  forces  on 
t  .■  basis  of  voluntary  or  obligitor,  service,  tae  strategic  forecasting  of 
iun  i  i-re->ourcc  supply  md  tae  pi  tailing  of  uumn- resource  nee  is  will 
require  dir  fere  it  emprises.  Conscript  forces,  wiich  are  .sore  or  less 
ensure  1  a  Leasomoie  sjpply  of  reciu:ts  an>i  a  representative  share  of  Lae 
ivailibie  /out  i  talent,  are  pri-.urlLy  concern**  1  w  i  t :  i  personnel  screening 
mi  iL loc at  ion  inactions.  All-volunteer  force.*  (Av'rs),  on  the  uluc-r  hand, 
wtici  ire  particularly  susceptible  to  tie  effects  or  free-roarket  economics 
oi  c  a-  luuntity  and  quality  of  tae  available  manpower  pool,  mus.  unvote 
cmisi  ierable  ..ttention  to  tae  business  of  attracting  applicants  before 
practical  screeaing  an  i  allocation  policies  Can  be  implemented .  Moreover, 
bee  luse  Avrs  are  iepenient  on  tae  willingness  of  young  people  to  serve  in 
t  a*  Hilary,  special  attention  must  be  given  to  any  qualitative  biases  in 
tae  composition  ot  tue  target  recruiting  pool  and  recruit  intake 
intro  luce i  oy  the  market  forces  whim  directly  affect  enlistment 
proper, site.  iror  a  more  comprenensive  liscussion  of  these  issues,  see 
Cotton,  Ho  on  u  Pinru,  13/8;  and  Pincu,  1982 .  P 

Niile  taere  are  a  number  of  linensious  oil  wnic’n  qualitative 
co,.ipirisou:>  among  recruit  couovts  can  be  maie,  military  stalls  are 
typical ly  concerned  witu  the  trainability  ot  recruits,  that  is,  their 
potent  in  to  acquire  those  job  skills  v/uien  are  a  imcessery,  but  not  i 
sufficient,  condition  for  operational  effectiveness.  in  tae  Canadian 
force;,  lot-;,  Lao  general  indices  ot  quality  l  id  traiaaoilitv  must  commonly 
invoke]  are  entry  scores  on  the  General  Cl  asset ication  (GC)  test  (a  group- 
aiui  aistere  i  latel  lagence  test)  and  educe  t  ion<a  1  attainment.  AT  tnougu  tue 
r«lat Lons  iip  between  these  iniices  is  not  entirely  orthogonal,  GC  test 
score,  can  be  essentially  intc'pretel  ,ts  measures  ot  learning  potential, 
waiie  edur.it  lonal  attainment  level.*  provi  le  measures  of  acquired 
learnis,.  [rends  in  to*  qu,>!i<  >t  Ot  n  r  .Lot!;  (UK>  recruits  over  tile 
1975-83  peri.. i  will  be  eaamined  in  tie.se  terms. 

Iren  Is  in  >k  R.-eruit  yuaiitv:  !  9  /  o—  i  98  ; 

As  illustrateu  la  i'ai'le  t,  to,.  iverige  .JC-test  Scores  ot  OR 
recruits  >rd  tue  1  ist  r  i  but  i  on  of  t  iese  score.*  have  ’'Omi  remark. iblv 
coisislent  ever  toe  10/3-1033  perioj.  ,  or  t  >■>»  n  .  because  the  GC  lest  is 
a  sLaao.ir  .  /,•  d  test,  it  se<  .a ,  r»-.i  >nna'>!  io  coin  .tide  in  it  Cecreils 

lue  View,  ill  oj.iiuu,  e '"presse  i  ill  t  i:  .  ,'ipe.  l'*'  loose  ot  tee  aataoi  end 
not  c.eceas  iri  1  .  t  io.,e  ot  tie  !)e  p  i  r  t  ■  1 !  oi  <(’:  >aii  !)i  ace. 
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w.t.  i.i :  ;>!■>■■>  i’.tr.'  eiucitlon  evlie-t.  In 

:  > 'v ir -  1  •  t'1  o.mc  :  nr^  f  Lguro.s  t  >r  t.e  daiaiim  pop”.  L  it  ion ,  fF  enr.i  Lments 
i.  ■  -!*  .t  i'  .-e  :,ic  tt  Lor.  ”.>up  . v .••  s  : i  f te ;  o.’or  time  r  roc  ov  >r- 

re.'.vso'it  it  to  1  t  o  t.e  tiore  fi/oor  nu'  ,>  o  1  uaier- represent  it  ion. 

->I..,i’..,  I,  t  .e  a  i  ‘  «  _>rs ;  t . -i*  :uc  i  t  ion  tie  !•’  ms  maint-iine  i  a  very 

:  ii.iur  nic  -•>-->-u  L  t  in  position,  mi  i  is  ,  In  Met,  Lnprove  i  tie  extent  of 

■ .  i  ' ,  ij ,  e  >  i  *  » t  i  t  *  >  i*  *  it  L  ->  i:i  t  .i  •  : 1.  .  1 .  c  roups ,  i  owe  ve .  *  w  j  it  i  are  tie 

s.mr  >  -.  t  ■  .•  i’ll  v  >:  its  recruits,  t  t  t  .e  us  ui  less  success, 

i '  i  ’ t  *  . .  i .  *  t > 1  , .  u  o /o ’  —  r e pres ei 1 1 1  i  it  tii*  soi  or.  1 1 -  .  t*  uic  it  lOii  . eve  1  to  1 

oi  ;st lot  1 1 1 1  -  T;  ie r -represents i  it  the  post- sec on  i  irv  level. 


>  i .  i 

•  < 1 .  i  s  i  j  e  i 
<•  :u  ■  iti 
'  '  inlet  to  , 


>  ^  t  ,e  ■  frou  sources  tilt  ru«*.',  1  lo : ;  Par<\,  '  )12)  w:i  cti  ie.il 

wit  i  f..  e  :>ic  it  ion  1 1  u  t  t ;  meitt ,  .mi  w  ileo  break  out 
ill  il  i'VU  i  inures  it  terns  progr  i  n  -cup! *t Lon  mu  non- 

pro /Lie  i  n.'re  use-'ul  ,  n  less  emplete,  picture.  While 


L,i  j  CO..:  .  t  is  t.e  .,<>  ier.li  s  l;t  tow'i.'.s  .l,ier  propo. l Lons  of  belter 
e, unite:  -ecruits  in:  tower  proportion  if  tie  less  eiuoatel,  it  also 
■  ->  cert  i  i  .  wo.i  messes  It  in-  Cl’s  recruiting  posture.  kocuit 

.otises  continue  to  oe  over-represente  ]  hy  non-  iig.1  school  graduates  and 
ur.  :e:  - re;r  vsente  l  n,  pm  Liates  or  post-secon  lary  scuoois  and  universities. 
.  iet»e  trot  is  are  reason  for  concern  in  view  of  fie  training  demands 
i.r.post  »  a-  pro  1  i  t  era t  Lon  of  complex  tec  mologv  in  all  areas  of 

•nilit.it,  fuit'-t  t  jn-Iit.’  in  i  la  lig.it  of  tae  strong  relit  ionsuip  between  nig  i 
a.  "ii.  cotpletion  mi  eirl career  survivi)  (Sinaisu)  a  Sc.iefleu,  1931). 


fa  ole  3 


Percentages  of  VF  )H  i.aroLLees  at  Various  Education  Levels 
Compiren  to  Poputition  Percentages 


".ducat  ion  Level 


.o.i-'i  i  ,.yi  sc  iooi  'Ir.ie. 
(.r  ddn  Pop 


dig  :  Sc  iooJ  3  rad 
(IF  'Ida  Pop 


Post  Sec  &  Univ 
CF  Cdn  Pop 


If’) 

d. 

i  37  ) 

1  9do 

'•  i 

.93  i 

;  m_ 

Vi 

19.3o* 

*  Pro  ji 

'  t  e  i 

l)  ’  S,  US’ 

,  >it 

f.H  i  i 

pro, ip  t 

it  la. 

ll  .  Til  ,  . 

■  oi  ‘i . 

,  ni 

•  s  t  U  il 

|  U  l  .  It 

.  i,I  >  e. 

17  , 

2  ) 


?  ! 


-52 


39., 


19 


3h« 

371 

42% 


Do  toe  ipp’reut  generil  gtins  In  the  educational 


247 


js® 


m 


o  4 


% 


6 


/pUilS. 


S  l  <i 


U'V 


i>*^  t-  >«**,  • »  i :  i : 

r  lA  1  i  n  vsl 


?  »  ‘  u  , 
>;e  iv 

u 

*  r'  1 1 L 


v. 


t  io:.s 


i  r  r  : 
t  .  it  , 

rn  a' . 


<  o  t4>  * 
x  4|><»r 
*>>>  .»»» 


>.11 
I  *  C 

>  1?*  V 


-  i.  1 1  /  ‘ 


^  i .  r .  . 


* ..mite; 


:  .  eiuc  it. on  t,  r:>  m  Oi  ions  ions 
‘ ,i  t  .o  Cr  is  a  c  liver  option 


.Uses  wit  .  ;  i  ;  .o  r  mmls  >:  e  iuo  it.  L>  i  ul  tut,  non;; 

"m.te:  stu'o’.ts  i  •*  pirtii'iK.ir,  t  .e  is  relative  I  v 

i  i  L- it '.I  i’  > :  e  ip!  >  no  it  ipportun  it .  .  for  t  »ose  *ii)»h 

i*  *  .>  >  i  .’e  tie  ,  >  j  t  en  1 1  i  i'll  ii  >  1 1  e  to  u'  ,u re  t  ec  .  in  1  c  1 1 
’Up....,  tie  •  -  .  is  .’i-rv  little  to  .liter  in  tie  rf.iy  of 

•.’.'ill,  ,r  -~,v  i  i'.  ioie.tlve  proqr  i.  is ,  oil  it  is  in  this 

e  -i.5  o  *e  .  lispiii.  *:  is  a  .ni]>i  teem  i' a  1  t r  lining 

t  .i *  ui  irti'u  i  t  r*‘  ’it  . 1 1  institutes  an  i  cm i nun  i  l  v  c  o  L  i  e  v;e  s 
it>  ..elr;  iurlr.-t  t..e  1  ^o!s  m!  1  )7"s  iPinou,  1982;  ki'lenemy, 
e-iO'-'t  to  t‘.<>se  w  ."  h  :  o  i :  ■  sol:  post-j>.-i  u.'.nr.-  qualifica- 
:•■><•>  not  uve  a  L  ite  •  u-e  it rv  program  for  s < i  Lie  i  ipplicants 
*’et  .avv  i  .'i.'e.-r  it'si  tore  i i  hi  spe*  'tiorl'.v  rewar  is 
sr>ec  i  1 1  i  :  .t  ion  ■  ill  ■  up  i  prorrsm  to  iiiress  tnl.s  latter 
love  ••pie  it;.  'ver>’.l,  tierefit'e,  it  must  ho  conoLu  led 


il  l  !**">  -5  t  t<-  r  1  s  l  *« 

it  f'  i^  i  jl'n  0  IP.  tit.* 

y  '  "  >  c.  >  a  *  ♦  i  ■  p  />  V 

r  i  s  u  to  t i vi  11  in 


lU.'il  if  i  mil,;  pro,;r  i..is  , 
a  i  !e  ..lore  at  tractive  to 
„lsts ,  in  i  o  in  h ?  aria 
e  iucat  ion  1 1  n  i  career 
I  is  not  lik-lv  to  tie 


career  structures,  an  i 
aspirinp,  or  quail  fied 
c^.upp  t  i,  live  if* 

opportunities,  this 
aiequatolv  tappe  1  for 


ml  source  if  personae 
i  i.ie  t  •  i  r  sin' . 


.  e  i a<* ; t  i f'n  ot  ippirent  versus  real  pains  in  eiucational  quality 
n-'n,;  A  recruits  is  sown!  lore  ilifieult  to  answer  iirectly.  As  ,a 
point  ••:  !•.■  part. ire  liou^.i,  it  suouH  i>e  note  1  tnat  the  pas.s/fail  criteria 
ase  :  i'  S'uoiis  an  l  wiicn,  to  a  tar,;e  ‘*\rtent,  iffect  ttie  rate  ana  extent 

•  >i  so. oil  i  : ;  i..ceinent ,  can  varv  cons  i  iera'>  Lv  over  time,  particularly  when 

i  ast  i  tut  i  >n  i  1  survival  is  fireUeie1  i,  iccthung  enrolments.  This 

■  »e>.  1  - .  t  .  ol  ,t  in  Jar  is  inu  tie  resjlti’i;  Jiiiirulties  in  making  aw.o:ute 

•  )  ipiris  ns  i  a  iv  lira  .>•  n  siouii  ilso  ;>e  viewed  in  t.ne  ii^h  of  recent 

pu  .1  io..t.owrs/  in  t..e  hiitea  Stiles  md  <  an  i !  i  over  Lite  iecLi'un<; 
iuilit.  oi  >»  iucat  ion .  Aeseiren  on  trends  in  scuolastic  achieve  aent  scores 
‘st  Mil.  loir  1.  Min  e  i  o  np  m  il)l  ? ,  measures )  stro.nqlv  indicates  i  decline  >  n 
f  e  icitmiic  competence  ot  yo.it  n  (Wat  >rs  a  Laurence,  1982),  Wiat  t.liese 
o."><  ,  v  •  t  i  ■  "i.  -oi  led  I  vel su>;,»;e  >l  is  ta. it  ueit.ier  the  absolute  nor 

c-l  It  1  V.  quilify  of  e  in  itioi  in  re.  >nt  veirs  is  wmt  it  use!  to  be. 

ml. i  1 1  .>pe. 't  to  tie  suet  1 1  n.ihl  l  i  t  •  o!  ,  o.  improvement  upon,  tne 
pres-uL  stile  oi  recruit  e  hie  1 1  i  on  1 1  piilitv,  iuture  developments  will 
lepe  i  o.i  i  .lui'ii  ei  •/.  t  .c  tors ,  many  i!  wuicii  nave  cent  r  i  but  «*<i  to  the  past 

■  e.;  .n  ir  5  oi  p  >od  iortu.ie.  Si.ne  tie  i>as*t  )*■  i  national  ocono.it  Ic 

fe.  .•  >  >ioii  ii  I'd:,  tne  n  it  ion  )I  une..ip:.)  viient  rates  in  tne  work  fo-ee  hive 
ra  i,,;e  i  netwe.i  L),  mi  13,,  .-'it, i  tie  rites  runnin,_  soiaewiiat  ui;;iier.  at  10“'. 
to  !>  ,  l  >•  to  ei.it  i  popu!  it.  i  on  iSiilistus  hinaii,  l't.3)).  Lonsequenl  i  y , 

t  ie  e !  loot*,  ,[  tie  )’e|\  ietlO^r  \  yi  j .  prop.e-c  i  <>  ,  of  tne  lite  1970s  li  ive 

•ee  .  teipo.o.  postpone,.  >vej  t.n  s  inie  t  i  .a  ■  periol,  la  )OUr-in<'l.‘KO  l 

coalitions  uv..  ui  i  sunstant i  1 1  inpa't  on  tie  vo’untnv  attrition  of  cF 
eu  ,t,e‘.  A  ,  ;  epor  to  !  b  j  Ion  ie  ;  ail  i  ,von  i  1  38  V ) ,  i  L  t  >  1 1  i  on  fell  oft 

ripiiiv  i  'oin  13/ a  to  |)8  3  a.  une  ipio  ,ae.i  t  rose  (r~-.?4),  mi  annua  L 

i  e,  i  u  ,  L  .  i  |  i*  l  is  tell  li'i  o|  !i  i,;l  i  ,  h  .'mj'i  i  lie  i  e  I  I  ei  t  s  o !  in  '  'li'i'c.llii  I  ; 


vm  v*  v—  v  *  v 


*  9- T  *Jr  -J*  r'y 


*jt  *J*  -*jr  *J»  A* 


l  '  (■  *  v  "  «t  ^  ”  »  “il  «  *  *  » 

V  •  v .-  V'.'  v.  .r  r.  e. 


supply .  hroug  .L  oout  b.  high  unemplo /ment  in  tie  13-24  /ear-old  segment 
or  t..e  p  vpu Lit! on ,  1 1  i  of  a  decreus ing  iemani,  brought  about  by  decLi.ning 

attr  It  lou ,  1 1  .vst  v  semi  In  tie  tpplicant  an  l  enrol  neat  figures  for 

t  .Ls  pt-rioi  la  i  tie  corresp.>n  ilug  Iran  it  io  improvement  in  CF  selection 


..it  i  l  .amber  ■>:  ;'F  >t..er 

Rmx  \ppl 

ieanls  m  i 

-  :r  v!  lees  for  Fiscal  Year 

■  ad 7 9/3C 

to  1983/84 

*soi!  iVat* 

Xppi i rants 

1 1 

Selection  Ratio 

1  ->79/o«' 

2  s ,  -c>8 

l  i,3io 

1:2.8 

ihdJ/di 

jo  ,2-*J 

L 1  , 36b 

1:3.0 

19d/82 

39 ,  o2o 

12,261 

1:3.2 

’ 982/ 8 j 

',n  1  a 

^  V  )  *-  ♦  W 

6  ,  b74 

1:6.0 

1983/8e 

29,/  3-3 

4,038 

1:7.4 

It  wou  1  1  be  t  uremely  surprising,  if,  under  tnese  coalitions, 
recruit  .jmlit/  were  anvt  ling  but  sign.  Yet  bece.se  current  CF  recruit 
(uaiity  is  i  irgely  m  artitacl  of  a  recent  national  economic  downturn,  it 
is  sobering  to  think  wbat  migut  happen  if  there  is  an  upswing  in  the 
econo, ly.  9u  tie  supply  side,  declining  unemployment  in  the  youth 
population  mi  tut?  Intractable  elfects  ot  iemographics  will  rapidly  shrink 
tae  eligiuie  pe  r  »oa  ie  l  pool.  On  fie  tern  m  l  side,  a  surge  in  voluntary 
ittrition,  ...»  external  enployaent  opportunities  materialize,  will  probably 
iriv'e  up  teciuiting  juntas  to  or  be  yon  i  pre-recession  levels.  The  first 
impact  woul  1  he  on  quality.  ‘lumbers  coaid  also  be  a  problem. 

viitiout  forward- '  coking  recruiting,  strategies  and  personnel 
policies,  part ioularly  fiOoe  designed  ml  preplaced  witn  a  view  to  coping 
with  i  aa  rpower-suoL tage  cant i agency ,  tae  Cr  tuns  a  real  risk  oi  suffering 
a  strategic  self-inflicted  wound.  fie  policy  areas  wtiicii  require  urgent 
review  iii  order  to  forestall  tuis  eventuality  iaclude  the  id  Lowing: 

a.  redefining  the  eligible  in  m  power  pool,  wuicii  would  minimally 
iMlitl  tue  remo/il  of  age  harriers,  tue  expansion  of  women's 
roles,  ml  the  development  of  1  aler  1 1-entr /  programs  for 
skill  ed  ippi  ic.mts, 

o.  ii.ipr'ovja,,  the  fit  between  military  training,  and  civilian 
education,  wiicu  could  i.van  increased  i  e  I  lance  on 
”<>i ; -tue-s  lelf  skills  a,;d  tue  establishment  or  CF-sponsored 
progr  ims  it  tecanical  colleges; 

•  .  'ie ve i opi  '.g  tin*  recru i  L..ie,tt  potential  m  tue  uiLitary- 
•>oo  i  1 1  i  ca  t  ion  i  n  i  r  is  true  luri* ,  viiei  n  isle  illy  me  ms  providing 
hi .  •  apport  t<>  <  i  let  ml  uVi.-rv  •  or,  ,an  i  /  i  t  i  mis  ;  and,  tinallv. 
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••vp-oring  t:ie  possibility  ot  a  national  service  scu-no,  waica 
mult  varv  in  application  fnen  tie  institution  of  a  national 
pupi  ic-serv'i.ce  irart,  w.ilc  ;  incluies  military  service  as  an 
option,  to  a  comprehensive  i teen L ivc-and-benef it s  program  for 
i i  i ;  1 1 .  service . 
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Role,  Importance  and  Availability  of 
GFAF  Reservists 

Heinz-J.  Ebenrett 

Federal  Armed  Forces  Office,  Bonn,  FedRep  Germany 

presented  bv 
Hans  Kuessner 

Federal  Armed  Forces  Office 

The  Mission 

The  mission  assigned  to  the  German  Federal  Armed  Forces  (GFAF)  by 
NATO  requires  the  three  Services  -  Army,  Air  Force,  and  Navy  -  to 
render  an  essential  and  decisive  contribution  to  the  collective 
defense  of  Central  Europe  and  the  Baltic  Approaches. 

With  its  345,000  soldiers  on  active  duty,  the  Army  contributes 
the  largest  share  to  the  total  strength.  The  Air  Force  will  grow 
from  almost  111,000  active  airmen  in  peace  to  about  twice  that 
size  in  war  while  the  Navy  will  increase  its  present  complement 
of  39 ,000  sailors  1.75  times.  These  strength  figures  can  only  be 
achieved  by  armed  forces  that  are  structured  as  a  conscript  army 
with  a  high  percentage  of  temporary-career  and  regular  personnel 
and  chat  are  supported  by  strong  reserve  personnel. 

Reserve  Personnel 

since  the  inception  of  the  GFAF,  5.2  million  men  have  performed 
military  service  in  the  "Bundeswehr".  2.4  million  of  them  are 
still  available  for  military  purposes  due  to  age  and  level  of 
training. 

At  present,  762,000  reservists  are  included  in  plans  for 
assignments  which  must  be  filled  by  mobilization  to  achieve  the 
full  defensive  capability  of  the  armed  forces.  Of  these  762,000 
reservists  firmly  planned  for  mobilization,  40,000  are  officers, 
196,000  NCos  and  526,000  privates. 

In  the  next  two  years,  it  is  intended  to  raise  these  numbers  by 
90,000  in  order  to  be  able  to  perform  the  task  of  the  Wartime 
Host  Nation  Support  (WI1NS)  Program.  As  a  result,  wartime 
strength  will  grow  to  1.34  million  service  personnel,  beginning 
in  1967. 

Use  of  Reserve  Personnel 

Army  and  Air  Force  are  currently  providing  for  60,000  reservists 
in  the  "Standby  Readiness"  component.  In  a  crisis,  the  Federal 
Minister  of  Defense  can  recall  standby  readiness  personnel  to 
active  duty  in  order  to  improve  the  operational  readiness  of 
specific  combat  units  of  the  Army  and  the  Air  force.  Following  a 
decision  by  the  Federal  Government,  the  men  in  the  Alert  Reserve 
are  recalled  in  the  course  of  mobilization  measures.  These  men 
will  round  out  units  and  agencies  of  all  the  Services,  assume 
tasks  designed  to  maintain  the  operational  freedom  for  the  armed 
torce« ,  and  support  the  allied  forces.  The  balance  of  the 


reservists  firmly  included  in  the  plans  will  be  available  to 
major  un’ts  as  replacement  personnel. 

The  requirement  of  the  armed  forces  for  reserve  personnel  can  be 
met  in  terms  of  numbers,  but  there  are  still  some  deficiencies  as 
far  as  quality  is  concerned,  since  it  lias  so  far  been  impossible 
to  train  all  reservists  during  their  periods  of  active  military 
service  for  their  particular  wartime  assignments.  They  must  be 
given  additional  training  during  reserve  duty  training  periods. 

deserve  Duty  Training 

Training  and  extension  training  of  reserve  personnel  is  provided 
by  a  system  of  reserve  training  periods  that  is  geared  to  the 
needs  of  wartime  operational  readiness: 

individual  reserve  duty  training  periods  are  designed  to 
provide  extension  training  to  the  individual  reservist.  Their 
normal  duration  is  two  to  four  weeks.  Reservists 
often  volunteer  for  them. 

Mobilization  exercise  of  a  duration  up  to  12  days  are  in 
the  first  place  intended  for  the  training  of  those  elements 
which  in  peacetime  exist  merely  as  equipment  holding  units 
with  no  personnel  or  as  cadre-strength  units. 

Mobilization  alert  exercises  are  conducted  without  warning. 
They  last  up  to  three  days,  and  their  purpose  is  to  practice 
mobilization  procedures. 

The  importance  which  the  armed  forces  attach  to  the  reserve 
component  will  continue  to  grow  in  the  years  to  come.  The 
reservist  concept  is  currently  being  updated  in  connection  with 
studies  under  way  concerning  the  structure  of  the  German 
forces.  The  goal  is  not  only  to  continue  to  meet  the  numerical 
requirement  of  the  forces  for  reserve  personnel,  but  also  to 
improve  the  quality  of  training  through  organizational  measures 
and  to  ensure  that  the  burdens  involved  are  distributed  to  all 
reservists  as  equitably  as  possible.  Spaces  for  reserve  duty 
training  are  to  rise  to  6,600  by  1986.  This  will  make  it 
possible,  for  the  first  time  to  recall  more  than  200,000 
reservists  per  annum  for  reserve  duty  training  periods  and  to 
narrow  the  still  existing  gaps  in  their  training. 

In  the  nineties,  the  number  of  spaces  for  reserve  duty  training 
will  gradually  be  increased  ti  15,000.  These  measure  will  enable 
up  to  <4i)D,o0  0  reservists  per  year  to  be  recalled  for  reserve 
duty  training  periods  and  to  be  trained  for  their  wartime 
assignments.  Ke serve  personnel  must  then  expect  to  be  recalled 
more  frequently  and  perhaps  for  shorter  periods  of  time.  Such  a 
conscripted  program  would  pose  an  additional  burden  both  on  the 
individuals  concerned  and  on  industry  and  economy. 

Model  "Reservist  Volunteering  by  Particular  Declaration" 

There  are  doubts  that  these  far  reaching  plans  (mentioned  above) 
can  be  realized;  especially  with  regard  to  the  readiness  of 
reservist  ,  to  participate  repeatedly  on  extension  train!  ig 


periods. 


In  order  to  attract  q  u  >.  1  i  r  i  <.  d  reserve  personnel  tor  matters  of 
intensified  training,  a  special  model  has  been  developed.  T.nis 
model  is  called  "Reservist  Volunteering  by  Particular 
Declaration".  It  consists  ,r  the  following  elements: 

1.  Ibe  reservist  f.'rrally  declares  to  render  repeatedly 
(additional  to  ooligutor.  reserve  duty  training) 
mobilization  e'vrcitt  arid  training  periods  of  at  least 
28  days  an.  aally  ravbo  le^s  when  realized)  for  a  minimum 
o  f  3  v  o  art  . 


2.  Trio  status  it  t  *'  e 
of  a  corsc'ipt  and  his 
his  c  i  v  i  1 1  a  n  pc  one  . 


lun tee  ring  Reservist"  will  be  that 
vrent  shall  be  fixed  according  to 


3.  With  respect  to  dates  and  terms  ot  exercise  the  reservist 
is  offered  *o  come  to  a.n  inderstanding  with  his  mobilization 
unit. 

a.  As  t  ji  incentives  the  reservist  may  expect  development 
a  a  d  p  r  u  .u  u  t  i  u  n  i  a  his  i  n  d  i  v  i  d  a  a  1  n  i  1 1 1  a  r  y  career. 


The  model  primarily  aim-  at  leadership  personnel  in  mobilization 
units  up  to  the  level  of  a  battalion  commander.  The 
"  '•  o  1 1.  n  t  c  o  r  i  n  y  Reservist"  will  generally  receive  military  training 


in  the  same  mobilization  unit.  The  purpose  of  this  is  to  enhance 
unit  cohesion  among  reservists  as  a  way  of  increasing  unit 
effectiveness.  Additionally  it  is  expected  that  the  commitment  of 


the  "Volunteering  Reservist",  i.o.  their  understanding  of  and 
corresponding  to  common  defense  duly,  will  have  a  distinctively 
positive  influence  on  fellow  citizens. 

( Su p p 1  erne n t a i y  to  the  model  mentioned  above  there  are 
reflections  about  the  introduction  of  a  status  for  part-time 
soldiers.  Preliminary  considerations  aim  at  civilian 
operators,  mechanics,  technicians  etc.,  who  can  make  it 
possible  to  serve  as  part-time  soldiers;  for  example  one 
q i v  re r  week  or  several  days  a  month.  That  type  of 
"resewist"  would  be  earmarked  to  move  and  maintain  military 
equipment  and  material  of  those  elements,  which  in  peacetime 
exist  merely  as  equipment  holding  units  with  no  personnel  or 
as  cadre-strength  units.  Nevertheless,  at  the  present  time 
therz  is  °  >  legal  basis  tor  part-time  soldiers  in  Germany  and 
therefore  this  noi,.  1  has  not  vet  exceeded  the  status  of 
preliminary  considerations.  It  at  ail,  it  may  be  carried  on 
when  the  model  concerning  the  "Volunteering  Reservists"  has 
been  r  e  a  1 i z e d . ) 

Inquiry  ot  the  d<-;r»e  of  A  optan'-e 

The  eccision  wiut'-.-r  the  "Volunteering  Reservist"  model  will  he 
realized  or  not  is  still  1  *•-  pen-1  i  nv .  inter  alia,  it  depends  on 
th->  d  <  g  r  e  e  ,i  among  the  reservists.  In  order  to  get 

-  ,i  1  i  d  Hi  f  •  -  i  ■  >  •  ;  r  or  n  os  i  s  and  planning,  the 
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Psychological  Service  of  the  GFAF  has  been  assigned  the  mission 
of  dererminig  this  acceptance  by  means  of  a  representative 
survey. 

Currently  a  representative  sample  of  2.500  reservists  is  being 
conducted.  Results  will  be  given  not  before  the  end  of  the 
year.  Nevertheless,  some  preliminary  trends  can  be  drawn  from 
pretest  data.  The  following  preliminary  data  may  be  of  some 
interest. 

Primarily  for  matters  of  testing  the  suitablility  of  the 
questioning  instrument  a  sample  of  182  reservists  in  leadership 
functions,  who  happened  to  render  a  reserve  duty  training  in 
August  1 V  8  5  had  been  asked  about  their  interest  in  the 
"Volunteering  Reservists"  model.  The  proportion  of  answers  to 
the  central  question  show  a  distinctly  high  degree  of  acceptance: 


Interest  in 

Participating : 

(N) 

total 

NCO  '  s 

Senior 

Officers 

or  ca n d  . 

NCO  ’s 

"ves" 

74 

33 

16 

25 

"uncertain" 

22 

"no" 

85 

(1  missing) 

63 

30 

1  5 

74  of  the  182  reservists  (=  41  X)  stated  an  individual  interest, 
the  officers  in  the  sample  even  by  majority. 

Reason  for  Rejection 

With  respect  to  those  subgroups  who  showed  "no"  or  "uncertain" 
interest  (h  =  85  +  22  =  107)  it  is  obvious  that  the  predominant 
reasons  given  for  rejection  referred  to  occupational  claims  or 
hinderances.  Claims  of  family  were  of  minor  but  significant 
importance.  Although  each  third  of  the  uninterested  reservists 
manifested  doubts  in  the  purpose  and  sense  of  military  duty,  too, 
it  can  be  said  that  the  main  reasons  for  rejection  are  more 
objective  hinderances  in  the  occupational  and  personal  sphere  and 
less  negative  attitudes  or  reservations  towards  the  military 
duty: 

Reasons  for  Rejectionymultiple  answers) 

Sul)  groups:  "no  interest"  (  N  =  8  5  )  and  "uncertain"  (  N  =  2  2  ) 

"primarily  "important  (others) 


occupational  claims 

63 

18 

26 

reservations  of  the 

boss 

2  8 

23 

53 

occupational  drawbacks 

30 

15 

62 

claims  of  family 

3  1 

27 

4  9 

reservations  of  the 

spouse 

2  1 

26 

60 

doubts  in  purpose  and 
sense  of  military  duty 

2  0 

16 

71 

reasons  for  Volunteering 


As  for  the  subgroup  of  those  reservists  who  showed  interest  in 
the  model  it  is  noteworthy  that  the  actual  source  of  that 
interest  seems  to  be  a  positive  attitude  towards  the  military  in 
general.  The  far  overwhelming  majority  of  the  interested 
reservists  stated  a  noticeable  rate  of  contacts  to  the  armed 
forces,  positive  experiences  during  terms  as  well  as  the  utility 
of  military  training  while  on  civilian  business,  The  reasons 
which  may  attract  them  to  consider  a  participation  on  the 
"Volunteering  Reservist"  model  are  given  in  the  following: 

Reasons/ Conditions  for  Volunteering 


Subgroup:  "interested  in  . 

"very 

.  .  N  =  7  A 

important" 

"important" 

( o  t  h  e 

agreements  upon  dates 

5  3 

18 

3 

good  morale  in  the  unit 

3  b 

31 

7 

suitable  duties 

36 

23 

1  5 

military  leadership 
d  eve lopment 

34 

25 

15 

promotion  in  rank 

20 

27 

27 

actions  near  home 

25 

14 

35 

financial  incentives 

15 

3  0 

29 

The  condition  to  come  to  an  understanding  with  the  mobilization 
unit  upon  the  dates  and  terms  of  training  periods  is  of  utmost  importance. 
Nearly  ail  of  the  interested  reservists  wanted  to  be  able  to  plan  and 
determinate  duty  terms  in  advance.  If  that  is  made  sure,  they  seem  to  be  more 
attracted  by  idealistic  views  and  incentives  (i.e.  morale,  duty,  leadership) 
and  less  by  objective  advantages  (i.e.  promotion,  payment,  short  distances). 

Summary 

Although  the  data  may  not  be  considered  representative,  they  arre  suggestive 
of  certain  tendencies,  which  can  specifically  be  verified  in  the  main  study. 
With  respect  to  the  far-reaching  aims  of  the  "Volunteering  Reservist”  model  it 
is  hoped  that  both  substantial  trends  we  mentioned  above  may  be  confirmed; 
i.e.  the  sufficiently  high  degree  of  acceptance  as  well  as  the  note  that  in 
the  first  line  reservists  with  idealistic  views  and  motives  are  attracted. 
Therefore,  their  commitment  and  understanding  of  duty  to  render  repeatedly 
voluntary  training  periods  may  justly  be  expected  to  have  a  lasting  positive 
influence  on  defense  motivation  of  draftees  and  civilians,  alike,  in  their 
social  surroundings. 


Retirement  Readiness  as  a  Function  of 
Transition  Assistance  and  Trade 
Major  F.  P.  Wilson 

Canadian  Forces:  Director  of  Personnel  Selection, 
Research  and  Second  Careers 


Retirement  normally  represents  the  end  of  routine  daily  employment  in 
order  to  earn  one's  living.  Mid-career  change?,  on  the  other  hand,  involves 
leaving  one  job,  either  voluntarily  or  involuntarily,  with  the  expectation 
of  undertaking  other  employment.  A1  though  leaving  the  military  after 
serving  20  or  more  years  is  commonly  referred  to  as  retirement,  the  majority 
of  Canadian  Forces  ( CF)  leavers  go  on  to  second  careers  (Pinch  &  lLaneT  , 
1978).  Better  insight  into  the  problems  of  military  mid-career  change  can 
be  acquired  bv  considering  this  phenomenon  from  the  perspective  of 
developmental  and  vocational  psychology  theories. 


Research  completed  by  Levinsori,  Harrow,  K1  ine? ,  Levinson  and  McKee 
(1  978)  suggested  that  there  is  a  human  life  cycle  made  up  of  a  series  of 
stages  and  substageS,  with  everyone  more  or  less  passing  through  this 
process.  Levinson  et  al  .  (  1  978)  and  others  (lowenthal,  Thuristes  & 

Chiribogd,  1975;  N'eugarteri,  1977;  VaillanC,  1977  )  view  the  40  to  50  age 

period  as  being  particularly  volatile,  frequently  marked  by  personal 

crises.  Career  and  occupational  concerns  play  a  significant  role  in 

theories  of  adult  development.  Personal  problems  emanating  from  the 
workplace  can  gain  overriding  importance  in  a  person's  life.  Medical 
research  in  the  I'S  Forces  identified  the  "ref'rement  syndrome"  (Berkey  & 
Stoebnef,  1968;  Qruss  ,  1965;  Greenhurg,  1965;  McNeil  &  Grifferi,  1967; 

Milowp,  1964)  which  is  characterized  by  both  social  and  intrapsychic 
problems.  These  writers  have  posited  that  the  social  difficulties  stem 
from:  the  financial  inadequacy  of  the  annuity,  which  must  be  supplemented 

by  other  income  from  a  second  career  in  the  civilian  labour  force; 

indecision  as  to  where  to  settle  upon  leaving  the  forces;  loss  of  friends 
with  similar  jobs  and  common  interests;  and,  perceived  difficulty  in  finding 
employment  in  an  environment  that  harbours  misguided  stereotypes  of  what  the 
military  retiree  has  to  offer.  The  intrapsychic  problems  are  in  part 
iltrihutable  to  the  anticipated  loss  of  status  and  high  degree  of 
rosponsiMl  ity  for  men  and  equipment,  loss  of  the  security  which  guarantees 
the  individual  that  he  and  his  family  will  be  cared  for  If  he  becomes  ill  or 
i  ncapac  i  ta  ted ,  loss  of  friends  and  lifestyle  structure!,  and  the  spectre  of 
impending  ambiguity,  uncertainty,  complexity,  and  conflict  attached  to 
"starting  over"  in  a  civilian  career. 


According  to  Drtiss  (1965)  and  Greenhurg  (1  965),  these  social  and 
intrapsychic  problems  during  the  latter  part  of  the  serviceman's  career  lead 
to  marital  discord,  excessive  drinking,  depression,  Insomnia,  a  variety  of 
psychosomatic  complaint*},  and  reduced  work  performance.  Researchers  in  the 
Canadian  context  have  also  examined  probl  tans  associated  with  mid-career 
change.  Pinch  and  Hamel  (1978  )  pointed  out  that  many  long-term  service- 
members  1  no  k  knowledge  concerning  the  civilian  job  market,  are  unable  to 
meet  formal  educational  prerequisites  for  jobs,  and  have  failed  to  recognize 
the  necessity  for  advanrod  pre-retirement  planning.  These  authors  also 
found  that  personnel  fr^m  "hard"  military  occupations  (  i  .e  .  trades  without 
readily  disrernable  civilian  counterparts)  were  unemployed  longer  and  were 


less  likely  to  regain  former  pay  and  status  levels.  Second  career  conterns 
iffect  all  -embers  to  sone  decree  and  can  have  a  deleterious  effect  on 

personnel  performance  luring  the  last  several  years  of  service. 

Rec  z  i  ng  that  providing  pre-rot  irment  assistance  not  only 

fulfills  <-.  -  >rnl  obligation.,  hut  also  in  pic  t  s  favourably  on  operational 
effectiveness,  the  Department  of  Rational  Defence  (P\T>)  has  instituted  the 
Second  Career  .Assistance  Network  (SCAR)  program .  Components  of  SCAN'  include 
instruction  in  resume  preparation  and  job  search  techniques,  information  on 
finan'ial  planning,  ml  i  tary'civil  *nn  trade  accreditation,  and,  counselling 
on  personal  a:xl  social  adjustment  concerns  judged  to  be  significant  to 

mid-life  career  change.  Hie  aim  of  the  program,  is  to  help  the  participants 
become  more  psychologically  and  practically  prepared  to  leave  the  military 
an*  l-eg  i  a  a  second  career.  Sven  though.  SCAN  11.1s  been  functioning  since 
1978,  its  effectiveness  for  ready-in*  personnel  for  retirement  has  yet  to  he 
e  st  ab  1  i  shed  .  Hie  work  reported  in  this  paper  is  part  of  research  being 
c  inducted  1 1  assess  SCAN's  effectiveness  in  aiding  the  second  career 
trans:  t  ion  process  . 

7*  examine  the  effects  of  SCAN,  the  psychological  construct 
v  cat:  mil  ir  career  maturity  was  'onsidered  to  be  an  appropriate  conceptual 
framework.  In  previous  research,  career  maturity  was  slxnwn  to  be  a  major 
determinant  of  educational  and  occupational  success  for  adolescents  (Super1, 
Crites,  Hamm el  ,  More,  Overstreet  &  Warrath,  1  957;  Super  &  Overstreet,  1960; 
Eelkowitz,  1 97  -+ ) .  Recognizing  the  difficulty  inherent  in  measuring 
vocational  maturity  beyond  adolescence ,  Super  (1  977  )  presented  a  revised 
theoretical  model  better  suited  to  understand ing  career  development  in 
adults.  Rater  modifications  (Super  &  Knasel  ,  1  979;  Super  ft  Kidd,  1  979; 

Super1,  1983)  operationalized  the  model's  dimensions  and  recommended  that,  in 
the  case  of  adults,  the  term  "career  adaptahil  ity"  he  used  instead  of 
vocational  maturity. 

Thus,  according  to  Super's  adult  paradigm,  career  adaptability  is 
made  up  if  five  dimensions:  Planfulnesd,  Ex  pi  orat  iori ,  In  format  to  il , 

Vorition.nl  Dec  i  s  ior.-'Vik  ing ,  and  Reality  orientation.  This  paper  will 
concentrate  on  the  Exploration  or  Exploratory  Behaviour  (EB)  dimension  of 
the  model.  lb  is  a  broad  concept  concerned  with  attitudes  toward  vocational 
resource**,  abilitv  to  distinguish  their  worth  from  a  personal  perspective, 
and  willingness  to  utilize  then  in  searching  for  a  second  career.  Due  to 
the  unique  context  of  this  research,  the  more  descriptive  term  military 
d  i  sengagement  readiness  (MDR)  is  used  syaoiiomousl  y  with  career  adaptability 
throughout  the  paper. 

Inasmuch  as  career  adaptability  *>r  military  disengagement  readiness 
is  a  mul  t  i-d  inens  i  onal  psycho!  og  ical  concept  (  lordaan  A  Hcyde,  1979;  Super  & 
dverstreet,  i960;  Super  &  Knasel,  1979  ),  it  provides  an  appropriate  medium 
through  which  the  effects  of  the  Sf'AN  program  can  be’  meisured.  If  SCAN  is 
effective,  there  should  be  :  positive  relationship  between  par t ic i pa t ion  in 
the  program  and  MDR.  For  those  with  marketable  civilian  skills  ( e  .g . , 
electrician,  plumber),  SC\N  appears  to  offer  services  that  are  immediately 
useful  (  e  .g  . ,  resume  'writing,  trade*  -er  t  i  f  icat  ioil ,  job  search  techniques). 
However,  for  those  whose  trades  are  only  remotely  marketable,  such  as  Combit 
\rms,  the  services  offered  bv  SCAN  ( e  .g  . ,  information  on  academic  upgrading, 
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skill  retraining  courses)  offer  a  nore  long  terra  solution  to  their 
problems.  This  study  treated  the  SCAN'  program  and  military  trade  category 
is  independent  variables  and  measured  their  effects  upon  one  of  the 
d  i  sene  ag  eiient  readiness  dimensions,  i  .e  . ,  the  indefendent  variable,  E  B . 
Hire  specifically,  this  research  investigated:  a;  whether  there  are 
different  initial  levels  of  MDR,  as  measured  by  the  EB  scale,  across 
military  occupations;  b)  the  overall  effect  of  SCAN  on  EB ;  and,  c)  whether 
trades  were  differentially  effected  b''  SCAN. 

METHOD 

"  Meets  were  35A  male  non-romni  ssioned  officers  divided  into  two 
groups:  the  SCAN  treatment  group,  those  who  were  registered  in  the  program 

snd  h  ,d  attended  at  least  one  seminar;  and,  tM  No  SCAN  control  group,  those 
who  had  no  contact  with  the  program  but  intended  to  register  in  SCAN  before 
retiring.  ’..'itbin  each  of  these  groups,  three  trade  categories  were  imposed: 
Cat  1,  ’hard’  nilitarv  trades  with  no  civilian  counterpart;  Cat  2,  low  to 
semi-skilled  trades  with,  civilian  counterpart;  Cat  ?,  highly  skilled  trades 
with  civilian  counterpart.  The  three  trade  categories  were  developed  from 
occupational  descriptions  contained  in  the  Canadian  Classification 
Dictionary  of  Occupations  (CCPO).  Cat  1  contained  18  traded,  Cat  2 

Contained  39,  and  Cat  3  contained  93. 

Die  Military  Disengagement  Readiness  Inventory  ( MDR  I )  was  designed  to 
measure  disengagement  readiness  within  CF  retirees.  The  MDRI  questionnaire 
contained  eight  scales  reflecting  Super's  dimensions  of  career 

ad iptahil ity.  The  EB  scale  was  developed  based  on  the  Super  and  Knasel 

(  1  979  )  and  Iordan  (  1  963  )  operational  definition  of  vocational  exploratory 

behaviour.  It  sought  to  measure  the  individual's  involvement  in  such  second 

career  planning  areas  as  learning  about  job  search  technique^,  writing, 
vocational  tests,  determining  pension  benefits  and  obtaining  information 
about  retraining  programs.  The  sc  a1  e  contained  16  items  and  was  scored 

using  successive  categories  from  1  to  6.  An  item  analysis  indicated  a 

’oefficient  alpha  of  .9038  for  this  scale. 

The  SCAN  g,roup  was  administered  the  MDRI  at  t lie  start  of  a  SCAN 

seiinar1,  and  again  two  months  afterwards.  To  generate  a  control  group,  the 
MDRI  i.us  sent  out  during  the  same  timeframe  to  members  identified  as  having 
five  ‘ > r  less  years  to  serve  before  retirement.  Individuals  who  indicated  on 
the  first  MDRI  that  they  were  planning  to  register  in  SCAN  at  a  later  date 
were  selected  as  the  No  SCAN  group  and  received  a  second  questionnaire  at 
the  same  time  as  the  SCAN  group. 

/ANALYSIS  AND  RESLLTS 

A  split  plot  AVIVA  was  used  with  two  between  subject  factors,  i  .e  .  , 
trade  i  ate, gory  and  treatment  group.  Pie  within  subject  variable?,  EB,  was 
measured  via  the  two  M  DR  I  a  dm  ini  strat  ions  a  pprox  Ima  tel  y  two  months  apart  . 
During  the  intervening  period,  the  SCAN  treatment  group  attended  a  SCAN 
seminar  and  par  t  ic  i  pa  ted  In  other  program  activities,  whereas  the  No  SCAN 
group  was  a<  exposed  to  any  aspect  o(  the  program. 


TABLE  1 


Mean  Exploratory  Behaviour  Scores  of 
Treatment  Group  Over  Time* 


CROUP 

TIME  1 

TIME  2 

SCAN 

47.  87 

57.  04 

(N=2  70  ) 

SD=1  7.  51 

SD  =  1  8.  4  5 

No  SCAN 

39.85 

42.45 

(N- 84 ) 

SD  =1  6.  28 

SD=1 6. 83 

x  Trade  group  means  are  not  sliown  as  all  differences  were  non- significant 


Mean  before  and  after  EB  scores  for  the  SCAN  and  No  SCAN  groups  are 
presented  in  Table  ] •  Overall  F  vatue  indicated  that  there  is  no 
significant  trade  by  treatment  interaction  over  time.  However ,  the 
treatment  by  time  interaction  was  significant,  F_  047,1)  =  14.60,  _p<.001 
wi  th  the  group  receiving  SCAN  showing  a  greater  increase  in  EB  on  the  second 
measure  of  the  MDRI-  There  were  also  significant  main  effects  for  time:  ,  F_ 
(347,1  )  =  108.  1  1  p<  .001,  and  treatment,  F_  (347,))  =  30.  68  £<.001,  but  not 
for  ’trade,  F  (347,2  )  =  .4759,  p>.05.  (Reported  F  values  are  based  on 
multivariate  test  results.) 


Paired  comparisons,  using  Dunn's  test  of  significance  (p  <  .05/4 
.0125)  indicated  that  upon  the  first  testing,  there  was  a  significant 
difference  between  the  SCAN  and  No  SCAN  groups,  with  the  SCAN  group 
displaying  greater  EB,  £  (359)  =  3.57,  p  <.001.  The  second ^administrat ion 
sliowed  even  greater  difference  between  these  two  groups,  Jt  (359  )  =  6.46,  p< 
.001.  Over  time,  the  SCAN  group  showed  a  significant  increase  in  EB,  £ 
(269  )  =  10.  63,  p<.001,  whereas  tlie  No  SCAN  group  remained  unchanged,  _t  (83) 
=  1.91,  p  =  .06. 


DISCl'SSJON  AND  CONCLUSION 


The  aim  of  this  research  was  to  determine  whether  members  trained  in 
trades  having  readily  recognizable  >i-/iii~n  counterparts  are  more  prepared 
to  make  the  transition  to  a  second  careed ,  whether  SCAN  has  an  impact  on  the 
readiness  of  the  individual  to  make  the  t  an.-ition,  and,  whether  SCAN 
differentially  affects  readiness  to  seek  a  seond  career  across  trade 
categories.  As  demonstrated  in  this  research,  SCAN  is  definitely  affecting 
an  individual's  predisposition  to  explore  various  rays  of  finding  second 
careers.  This  corroborates  earlier  research  conducted  with  adolescents 
demonstrating  that  career  maturity  can  be  taught  (Yates},  Johnsori,  &  .Johnson, 
1  979  ).  However,  all  trades  appear  to  he  at  fee  ted  equally  by  SCAN.  That  is, 
those  in  "liard"  military  occupations  report  the  same  exploratory  behaviour 
as  those  in  more  readily  marketable  occupations.  An  explanation  for  this 
finding  may  be  that  the  EB  items  have  more  obvious  applicability  to  second 
career  concerns  than  other  MDRI  scales.  Such  activities  ns  preparing  a 
career  resume,  learning  about  job  search  techniques},  and  writing  vocational 
tests  would  appear  to  have  equal  relevance  across  trade  categories. 
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However1,  this  finding  also  may  be  an  ndication  chat  the  inili:ary 
occupations  could  be  rearranged  on  a  rational  or  statistical  basis  to 
reflect  a  Letter  delineation  of  categories. 

The  results  of  this  research  suggest  that  SCAN'  is  having  a  positive 
affect  on  at  least  one  aspect  of  second  career  adaptability.  It  seems 
likely  that  an  increase  in  knowledge  and  skills  directed  at  easing  the 
transition  to  the  civilian  workforce  would  reduce  the  stress  and  anxiety 
associated  with  retiring  from  the  military.  Thus,  this  leads  to  the  belief 
that  not  only  is  SCAN  fulfilling  a  moral  obligation,  but  it  likewise  may  be 
helping  to  maintain  a  more  eifeetive  serv  ic  emerober  over  the  last  several 
years  prior  to  retirement. 

This  paper  reported  on  t he  exploration  dimension  of  the  second  career 
adaptability  concept  (Super',  1983  ).  Although  EB  is  only  one  of  five 
discrete  dimensions  encompassed  by  second  career  adaptability,  the  results 
of  this  research  suggest  that  the  construct  can  be  operationalized  and  used 
for  gaining  greater  understanding  of  military  second  career  transition. 
However*  ,  much  stronger  evidence  to  support  the  validity  of  the  construct 
would  he  provided  by  investigating  the  post-retirement  status  of  SCAN  and 
non  SCAN  retirees. 
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Development  of  an  Air  Force 
Training  Decisions  System 

Sharon  K.  Garcia 

Air  Force  Human  Resources  Laboratory 
Brooks  Air  Force  Base,  Texas 

I .  Introduet ion 

The  management  of  technical  training  requires  the  coordination  of  efforts 
among  numerous  Air  Force  organizations.  Significant  roles  in  determining  the 
nature  of  technical  training  are  filled,  for  example,  by  the  Air  Force  deputy 
chief  of  staff  for  manpower  and  personnel;  by  Air  Force  functional  managers; 
and  bv  agencies  at  the  Air  Force  Manpower  and  Personnel  Center  in  charge  of 
job  classification,  assignments,  and  management  of  the  on-the-job  training 
programs.  The  Air  Training  Command  at  its  headquarters,  technical  training 
centers,  and  other  units  plays  a  central  role  in  policy  as  well  as  day-to-day 
training  management.  The  headquarters  cf  all  major  commands  have  a  major 
voice  in  training  decisions;  and  additional  inputs  are  gathered  from  various 
a/j  b. oc  trairing  stud v  groups,  A F  laboratories,  and  research  centers. 

Because  of  the  scope  and  complexity  of  the  training  and  personnel  systems, 
man;  decisions  that  impact  on  training  are  made,  to  some  extent  independently, 
bv  different  management  units  responsible  for  different  parts  of  the  training 
and  personnel  systems.  Conflicting  goals  are  inevitable.  As  each  unit 
attempts  to  optimize  operations  within  its  own  area,  the  net  result  is 
competing  objectives  and  total  system  suboptimization.  Part  of  the  problem  is 
that  relevant  data  for  many  training  decisions  are  not  available.  For 
example,  in  practice,  the  amount  of  resident  technical  school  training  is 
largeiv  determined  by  pre-defined  budgets,  and  whatever  content  is  not  covered 
in  the  school  is  left  to  on-the-job  training,  with  little  data  for  assessing 
the  impacts  on  OJT  resources,  costs,  and  capacities.  In  addition,  alternative 
wavs  in  whi~h  thA  overall  training  system  might  be  restructured  in  concert 
with  operational  changes  in  personnel  utilization  are  not  considered.  Hence, 
a  more  unified  total  systems  approach  to  such  problems  with  all  relevant  data 
considered  was  needed.  In  response  to  this  need,  Air  Staff  and  the  HQ  Air 
Training  Command  requested  the  AF  Human  Resources  Laboratory  to  conduct 
research  in  an  attempt  to  refine  current  training  decision  procedures.  The 
research,  entitled  the  Training  Decisions  System  (TDS),  has  as  its  objective 
the  development  of  a  computer-assisted  decisions  system  to  address  the  what , 
when,  and  where  training  decisions  for  <,ny  one  specialty.  What  refers  to  what 
;oh  tasks,  skills,  and  knowledges  should  be  emphasized  in  training.  When 
refers  to  at  what  point  in  an  airman's  career  training  should  be  provided 
(e.g.,  upon  entry,  OJT,  advanced  training).  Where ,  refers  to  the  question  of 
where  training  should  he  provided  (e.g.,  resident  school,  field  units,  CDC , 
MAJCOM,  other). 

11.  Development  of  the  Training  Decisions  System  (TDS) 

The  TDS  will  involve  the  development  of  tour  basic,  user-friendly 
interactive  subsystems.  They  are  the  Task  Characteristics  Subsystem  (TCS), 
Field  Utilization  Subsystem  (FCS),  Resource/Cost  Subsystem  (RCS),  and  the 
Integr.H  ion  and  Optimization  Subsystem  (10S). 
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The  Task  Characteristics  Subsystem  (TCS)  will  be  developed  and  used  for 
identifying  clusters  of  job  tasks  that  can  be  appropriately  trained  as  a  unit, 
based  on  common  skill  and  knowledge  requirements  and  other  shared 
characteristics  (e.g.,  the  probability  ot  co-perforirance) .  These  units  will 
be  known  as  Task  Training  Modules  (TTMs)  and  will  be  used  as  the  basic  unit  of 
analysis  in  the  Training  Decisions  System.  Once  task  training  modules  have 
been  designed,  a  methodology  will  be  developed  ior  allocating  these  TTMs  to 
the  appropriate  training  settings  or  sites  which,  may  include  initial  skill 
resident  training,  on-the-job  training,  or  correspondence  , ourses. 

The  Field  I'tilization  Subsystem  (FCf>)  will  be  used  to  describe  existing 
patterns  ot  airman  utilization  in  terms  of  jobs,  training  states,  and  major 
career  paths.  In  addition,  the  FI'S  will  attempt  to  define  training/personnel 
assignment-  patterns  that  represent  alternative  approaches  to  training, 
assignment,  and  use  ot  airmen  in  a  particular  specialty,  based  on  management 
preferences.  Doth  training  content  and  job  descriptions  will  be  represented 
bv  collections  ot  task  training  modules. 

Ihe  Resource  ( o<t  Subsystem  ( Rt S )  will  be  developed  for  estimating  the  costs 
and  resources  required  tor  training  ditterent  clusters  of  job  tasks  in 
alternative  training  settings;  for  estimating  the  training  capacities  of 
alternative  training  vettings;  and  tor  developing  summary  estimates  of  costs, 
rcpenrci  v,  and  >:  u  it  lev  needed  tor  specified  training  alternatives. 

!he  Integration  ->!  '  r  ir  1  ?  a  1 1  on  Subsystem  ( I  OS )  will  result  in  the 
integrities  ,  *  t,  ,  ttr<->  previously  described  subsystems  and  develop  effective 
decision  aids  i,  r  A I  technical  training  designers.  The  final  product  of  the 
IPS  will  h.  t Irnming  Decisions  Svstem;  a  user  friendly  software  package, 
bringing  together  a  complex  ot  elements  describing  training,  personnel,  and 
cost  l  at  tors,  as  well  ,as  management  policy  preferences.  State-of-the-art 
decision  model  technologies  will  then  be  applied  to  these  elements  to  answer 
"what  it"  questions  for  management,  and  to  develop  "optimal"  training  designs, 
for  more  cost-effective  training  decisions.  A  conceptual  diagram  of  each  of 
the  TDS  subsystems  is  provided  in  Figure  1. 

Research  and  development  of  the  TDS  is  a  four-year  contract  effort.  The 
prime  contractor  is  McDonnell  Douglas  Astronautics.  The  contract  began  in 
Sept  83  and  will  be  completed  in  Sept  87.  Initial  development  of  the  TDS  is 
being  applied  to  four  Air  Force  career  ladders  or  occupations.  They  are 
Avionic  Inertial  and  Radar  Navigation  Systems,  Security/Law  Enforcement, 
Aircraft  Environmental  Systems,  and  Electronic  Computer  and  Switching  Systems. 

III.  Cone lus ions 

Once  completed,  the  Training  Decisions  System  will  produce  a  training 
decisions  system  that  will  provide  re  .ly  available,  validated  information  to 
the  Air  Staff  and  user  commands,  especially  Air  Training  Command,  on  costs  and 
consequences  of  ttaming  decision  alternatives  under  different  constraints, 
costs,  and  personnel  utilization  patterns.  The  following  benefits  are 
anticipated  from  the  implementation  ot  such  a  system:  (a)  enhanced  mission 
readiness  through  optimizing  the  match  ot  technical  training  resources  and 
overall  operational  demands,  (b)  increased  training  efficiency  through 
optimizing  the  sequence  and  settings  in  which  training  occurs,  (c)  improved 
personnel  utilization  through  development  of  methods  tor  analyzing  functional 
job  patterns  in  relation  to  optimized  training  sequences,  (d)  increased  cost 
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effectiveness  of  training  through  the  formulation  of  training  decisions  based 
on  explicit  cost  and  resource  consequences,  and  (e)  reduction  of  excessive 
operational  training  commitments  through  more  accurate  estimation  and  analysis 
of  unit  capacity  to  train  while  meeting  ongoing  mission  demands.  In  short, 

TDS  will  give  managers  the  tool  to  plan  for  the  best  training  for  the  dollar. 
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DEFINING  TA6K  TRAINING  MODULES: 
COPERFORMANCE  CLUSTERING 

Drs.  B.  M.  Perrin,  D.  S.  Vaughan,  R.  M.  Yadrick, 
A  J.  L.  Mitchel  1 

McDonnell  Douglas  Astronautics  Company 
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Training  decision-making  in  the  Air  Force  is  a  process  of  balancing,  either 
explicitly  or  Implicitly,  a  number  of  distinct  and  often  conflicting  considerations. 
Instructional  effectiveness,  manpower  and  personnel  utilization  plans,  and  financial 
factors  must  all  be  balanced  in  deciding  who  gets  trained,  when  during  their  career, 
on  what  skills,  and  using  which  inodes  of  instruction.  Currently,  Utilization  and 
Training  Workshops  are  used  as  a  forum  to  weigh  these  considerations  in  determining 
training  policy  for  an  Air  Force  Specialty  (AFS).  Difficult  decisions  are  made  even 
more  complex  however,  because  the  Information  available  to  these  groups  is  often  frag¬ 
mentary,  due  to  the  number  of  distinct  skills  and  knowledges  in  an  AFS,  the  number  of 
possible  Instructional  modes,  and  the  costs  associated  with  training  each  skill  usinc 
each  mode. 

The  Training  Decisions  System  (TDS)  will  be  a  means  of  bringing  together  infor¬ 
mation  concerning  each  of  these  factors--1nstructional ,  personnel  utilization,  and 
f 1nanc1al--to  aid  Air  Force  ir  agers  in  establishing  training  policy.  The  key  ele¬ 
ments  of  the  TDS  are  the  sets  of  skills  and  knowledges  for  which  the  relative  In¬ 
structional  efficiencies  of  various  training  and  utilization  policies  will  be  'deter¬ 
mined,  These  sets  of  skills  and  knowledges  will  be  in  the  form  of  groups  of  Occu¬ 
pational  Survey  (OS)  tasks,  known  as  Task  Training  Modules  (TTMs).  Ideally,  TTMs  will 
be  groups  of  OS  tasks  that  are  relatively  homogeneous  with  respect  to  underlying 
skills  and  knowledges  and  that  are  relatively  distinct  from  other  groups  of  OS  tasks 
(i.e.,  other  TTMs);  consequently,  TTMS  should  capture  efficiencies  of  training  that 
might  result  from  common  training  materials,  content,  equipment,  arid  the  like. 

Additional  training  efficiencies  accrue  from  training  similar  tasks  together  If 
these  tasks  are  also  coperformed  (i.e.,  performed  by  the  same  personnel).  Thus,  TTMs 
should  be  composed  of  tasks  which  are  similar  and  which  are  performed  by  the  same 
-personnel.  If  the  information  to  be  provided  by  the  TDS  Is  to  be  maximally  useful.  In 
a  separate  paper  that  follows  In  these  proceedings  (Yadrick,  Vaughan,  Perrin,  and 
Mitchell,  1985),  we  have  documented  our  method  of  obtaining  expert  judgments  con¬ 
cerning  task  similarity.  In  this  paper,  we  present  procedures  that  may  be  used  to 
hierarchically  cluster  tasks  based  on  their  reported  coperformance  in  the  OS.  These 
statistical  clusterings  will  then  be  compared  to  experts’  groupings  of  the  same  tasks. 
Results  are  presented  for  two  AFSs--3Z8X4,  Avionic  Inertial  and  Radar  Navigation 
Systems;  ant,  ?T!Xa,  Security,  Law  Enforcement,  and  Law  Enforcement-Military  Working 
Dog  Qualified. 


METHOD 

Statistical  clustering  has  been  used  for  some  time  in  military  occupational 
analysis  to  aid  job  analysts  in  identifying  job-types  (i.e.,  groups  of  individuals 
performing  similar  tasks).  The  clustering  tech/ilque  that  has  been  applied  to  this 
problem  and  is  utilized  in  the  Comprehensive  Occupational  Data  Analysis  Programs 
TCODAP)  system  4s  the  average  linkage  clustering  procedure  (Ward,  1963).  This  pro¬ 
cedure  has  performed  well  in  empirical  studies,  as  compared  to  other  procedures  re¬ 
ported  in  the  statistical  literature  (Milligan,  1981;  Mojena,  1977). 


Figure  1  Illustrates  the  relationship  between  case  (or  person)  clustering,  the 
normal  application  of  CODAP  to  job-typing,  and  task  clustering,  the  application  to 
identifying  tasks  that  are  performed  by  the  same  personnel.  (Note:  Air  Force  job¬ 
typing  normally  uses  a  relative  time  spent  measure  for  clustering  cases  rather  than 
the  pen-formed/rot  performed  dichotomy  depicted  in  Figure  1;  in  this  case,  a  number 
between  0  and  1  representing  relative  time  spent  on  a  task  would  replace  the  l's  in 
the  figure.)  While  job-typing  involves  grouping  persons  who  perform  the  same  (or 
similar)  sets  of  tasks,  task  clustering  produces  sets  of  tasks  which  are  frequently 
coperformed. 

Task  clustering  can  be  accomplished  in  CODAP  by  transposing  the  raw  data  file 
that  is  input  to  the  system.  That  is,  instead  of  task  performance  data  for  each 
person  as  represented  on  the  left  side  of  Figure  1,  each  record  would  consist  of 
person  performance  data  for  each  task  (as  represented  on  the  right  side  of  Figure  1). 
This  transposed  data  would  then  be  processed  by  CODAP,  yielding  a  task  ccperformance 
cluster  diagram. 

Transposing  the  data  Input  file  has  the  effect  of  making  the  number  of  cases 
appear  to  the  system  as  the  number  of  tasks,  and  the  number  of  tasks  becomes  the 
number  of  cases.  This  effect  can  have  serious  practical  implications  when  the  number 
of  cases  is  large,  as  CODAP  is  limited  In  the  number  of  "tasks"  (transposed  cases)  It 
car  process  (the  IBM  version  can  handle  up  to  2000  tasks,  and  the  UNIVAC  version  can 
process  up  to  1700,  although  a  Uhl  VAC  rewrite  of  CODAP  will  be  able  to  process- 3000 
tasks). 


FIGURE  1:  A  COMPARISON  OF  CASE  CLUSTERING  AND  TASK  CLUSTERING 
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To  avoid  this  limitation,  one  may  compile  a  task  similarity  matrix  external  to 
CODAP,  end  then,  use  this  matrix  to  cluster  the  task  data.  The  performed/not  per¬ 
formed  similarity  measure  used  in  CODAP  is  as  follows: 

su  -  «W  4  <W>  '  2 

where  N.  .  is  the  number  of  persons  performing  both  tasks  i  and  j,  N.  is  the  number  of 
persons  performing  task  1,  and  N.  is  the  number  performing  task  j.  In  words,  the 
similarity  between  pairs  of  tasks^is  the  average  of  the  two  ratios  of  the  number  of 
persons  performing  both  tasks  divided  by  the  number  performing  each  task. 

Both  procedures--the  transposition  of  the  raw  data  and  the  computation  of  a 
similarity  matr1x--were  u.ed  to  analyze  task  coperformance  in  the  328X4  OS  data  to 
verify  the  methods.  Because  of  the  number  of  cases  in  the  811XX  OS  sample,  in  excess 
of  6000,  a  task  similarity  matrix  computed  outside  of  CODAP  was  used  to  cluster  the 
data. 


One  measure  of  the  homogeneity  of  the  tasks  taken  to  form  a  TTM  is  the  between 
group  similarity.  Between  group  similarity  is  the  average  of  the  similarities  of  the 
tasks  between  the  groups  being  merged  to  form  a  particular  cluster.  It  is  defined  as 
follows: 


B6u 

where  $.  .  Is  the  similarity  between  tasks  i  and  j  from  the  two  groups  to  be  merged  and 
is  Me  number  of  task  comparisons  between  the  groups. 

Several  statistics  are  available  to  compare  clusterings  or  groupings  of  a  single 
sample,  based  on  a  pairwise  classification  of  oases  for  the  two  solutions.  Table  1 
„ Illustrates  the  classification  scheme,  where  each  of  the  four  cells  specifies  a  type 
of  agreement  or  disagreement  between  the  solutions.  For  example,  cell  A  Indicates  the 
number  of  pairs  of  cases  grouped  by  both  methods,  while  cell  D  is  the  number  of  pairs 
grouped  separately  by  both  solutions.  -Cell  B  and  C  Indicate  frequencies  of  disagree¬ 
ment  in  which  pairs  are  grouped  by  one  method,  but  not  the  other.  Three  comparison 
statistics  can  be  defined  in  terms  of  these  cell  frequencies  as  follows: 

Rand  (1971):  (A  +  D)  /  (A  +  B  +  C  +  D) 

Jaccard  (Downton  4  Brennan,  1980) :  A  /  (A  +  B  +  C) 

Fowlkes  4  Mallows  (1983):  A  /V( A  +  B)  (A  +  C) 


Table  1:  A  pairwise  classification  scheme  used  to  compare  two  clustering  solutions. 
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All  three  statistics  .yield  a  value  of  1.00  when  the  two  solutions  agree  per¬ 
fectly,  and  all  tnree  have  a  lower  bound  of  0.  The  Rand  statistic,  despite  being  the 
most  widely  used,  is  often  inflated  by  using  pairs  not  classified  together  by  either 
procedure  (cell  C)  as  reflective  of  solution  consistency.  Both  the  Jaccard  and  the 
Fow'kes  S  Mallows  statistics  were  devised,  In  part,  to  overcone  this  probler.  Sam¬ 
pling  distributions  are  not  available  for  these  statistics;  consequently,  the  numbers 
are  net  directly  Interpretable  as  indicating  either  agreement  or  disagreement  between 
methods . 


Two  complications  in  comparing  expert  and  statistical  groupings  of  the  tasks  are 
worth  noting  at  the  outset.  First,  due  to  Infrequent  performance  of  some  tasks  within 
an  AFS,  the  experts  dropped  some  tasks  from  their  groupings.  While  the  number  of 
deletea  tasks  was  generally  very  small,  one  group  of  expert:  dropped  27  tasks  from  the 
328X4  05  task  list.  When  this  complication  occurred,  these  tasks  were  omitted  from 
the  analysis,  and  sc,  are  not  reflected  in  the  comparison  statistics. 

Tne  second  complication  resulted  from  experts  placing  the  same  task  in  two  or 
more  TTMs--an  action  they  were  directed  to  take  if  they  believed  it  was  necessary. 
Again,  in  absolute  terms,  this  action  was  taken  relatively  Infrequently,  although  one 
task  statement  was  placed  in  five  different  TTMs  by  one  group  of  experts.  This 
complication  was  handled  by  counting  each  occurrence  of  the  task  individually;  that 
is,  one  sorting  by  the  experts  may  have  agreed  with  the  coperformance  clustering  (and 
be  counted  in  cell  A  of  Table  1),  while  a  second  sorting  may  have  disagreed  with  the 
statistical  clustering  (and  be  counted  In  cell  B  or  C). 

RESULTS 

Unlike  case  clustering  to  identify  job-types  which  has  more  than  30  years  of 
research  and  practice  behind  it,  very  little  Information  exists  to  aid  in  identifying 
-Tuns  from  a  task  coperformance  cluster  diagram.  Rules-of-thumb  which  occupational 
analysts  have  adopted  to  narrow  the  search  for  important  jobs  ma.,  or  may  not  be  rele¬ 
vant  to  the  search  for  TTMs.  Particularly  problematic  is  the  determination  of  the 
number  of  TTMs.  Since  the  clustering  Solution  is  hierarchical,  TTMs  of  any  degree  of 
specificity  can  be  Identified. 

An  approach  to  this  problem  is  to  use  the  same  type  of  heuristics  a  job  analyst 
would  use  tc  interprete  a  case  clustering.  One  of  the  authors  of  this  paper,  who  is 
familiar  with  both  occupational  analysis  and  the  811XX  career  field,  used  this  ap¬ 
proach.  He  Identified  nineteen  general  task  content  areas  (similar  to  the  higher 
order  job  clusters  in  job  analysis)  and  67  TTMs  within  those  general  areas.  The  TTMs 
varied  in  size  from  2  to  34  tasks  and  averaged  just  under  10  tasks  per  TTM.  The 
between  group  similarities  of  the  TTMs  identified  from  the  811XX  OS  task  coperformance 
clustering  ranged  from  26.75  to  91.20,  and  averaged  58.87. 

While  these  TTMs,  by  definition,  have  the  desirable  characteristic  of  being 
composed  of  tasks  which  tend  to  be  performed  by  the  same  personnel,  the  degree  to 
which  the  tasks  are  similar  in  terms  of  skills  and  knowledges  is  not  known.  Some 
indication  of  the  ski  1 1/knowledge  homogeneity  of  the  coperformance  dusters  can  be 
obtained  by  comparing  these  results  to  those  obtained  from  having  experts  group  the 
tasks. 

Two  additional  issues  had  to  be  addressed  before  the  results  of  the  expert  and 
statistical  clusterings  could  be  directly  compared.  First,  the  number  of  statistical 
clusters  had  to  be  determined,  since  the  comparison  statistics  are  influenced  by  the 
specificity  of  the  results.  To  promote  comparability  between  the  procedures,  the 
number  of  statistical  clusters  was  set  equal  to  the  number  of  expert  task  groupings 
for  each  comparison.  The  second  issue  dealt  with  how  to  select  the  statistical 
clusters;  it  was  decided  to  use  the  task  clusters  which  maximized  between  groups 
homogeneity.  This  purely  statistical  criterion  was  chosen  because  it  permitted 
selection  of  different  sets  of  TTMs  with  different  degrees  of  specificity.  This 
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approach  also  eliminated  the  variability  that  eight  have  resulted  from  differences 
between  analysts,  had  Interpretations  of  the  coperformance  cluster  diagram  been  used. 
Thus,  the  comparisons  reflect  the  degree  to  which  the  statistical  clusterings,  without 
Interpretation,  capture  expert  task  groupings. 

Table  2  summarizes  the  results  of  the  comparisons  between  the  coperformance 
clustering  produced  by  CODAP  and  two  task  groupings  produced  by  separate  groups  of 
experts.  Both  the  Jaccard  and  the  Fowlkes  A  Mallows  statistics  are  reported.  The 
comparison  between  the  two  expert  groupings  in  the  328X4  AFS  produced  the  highest 
degree  of  convergence,  substantially  higher  than  that  found  between  the  811XX  expert 
groups.  It  should  be  noted,  however,  that  the  328X4  groups  consisted  solely  of  tech¬ 
nical  trainers,  while  the  811XX  groups  contained  both  school  and  field  personnel.  The 
similarity  of  the  backgrounds  of  the  328X4  experts  =av  partially  account  for  the 
greater  convergence  of  their  results. 

The  task  coperformance  clustering  In  the  8 1 1 XX  AFS  matched  the  expert  groupings 
as  well  as  the  experts'  results  matched  each  other.  Additionally,  the  coperformance 
clustering  agreed  more  closely  with  the  experts'  classifications  in  this  specialty 
than  in  the  328X4  AFS.  Again,  this  result  may  be  due  to  the  influence  of  the  field 
personnel,  whose  perspective  presumably  Is  more  attuned  with  performance  factors. 

CONCLUSIONS 

The  evidence  concerning  task  coperformance  clustering  suggests  that  this  pro¬ 
cedure  Is  workable  and  produces  task  groupings  Which  are  homogeneous  with  respect  to 
content.  -Once  computer  software  is  developed,  task  coperformance  clustering  can  be 
accomplished  at  relatively  little  additional  expense,  using  existing  OS  data  and  CODAP 
clustering  procedures.  As  this  methodology  develops,  additional  Items  may  be  added  to 
the  OS  and  computer  routines  derived  to  aid  In  the  identification  and  Interpretation 
of  TTMS. 


Table  2:  Comparison  of  statistical  and-  expert  groupings  of  tasks  In  the  328X4  and  the 
811XX  AFSs.  The  Jaccard  and  the  Fowlkes  &  Mallows  statistics  are  reported. 
Expert  groupings  1  and  II  are  results  from  Independent  efforts. 
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The  agreement  between  the  statistical  and  expert  groupings  of  the  tasks,  given 
that  it  was  roughly  equivalent  to  that  between  Independent  clusterings  by  experts.  Is 
encouraging.  These  results  are  even  more  promising  considering  that  the  task  clusters 
were  based  solely  on  a  statistical  criterion,  the  between  groups  similarity.  We  would 
net  recommend  that  this  technique  be  directly  implemented.  Rather,  an  experienced 
analyst,  working  with  one  {or  more)  experts  In  the  career  field,  should  be  used  to 
identify  and  refine  TTMs  from  a  coperformance  cluster  diagram.  This  procedure  would 
almost  undoubtedly  produce  more  homogeneous  and  useful  TTMs. 

One  final  word  of  caution  is  required.  The  ski H/knowledge  homogeneity  of  the 
task  clusters  produced  by  this  methodology  is  still  somewhat  suspect,  as  the  overall 
comparison  statistics  reported  cannot  specify  the  nature  of  the  disagreement  between 
results.  Presumably,  the  experts'  groupings  might  differ  only  according  to  subtle 
variations  among  tasks  that  are  clustered  by  one  group,  but  not  the  other.  On  the 
other  hand,  coperformance  clusters,  while  exhibiting  the  same  absolute  number  of 
differences,  might  include  strikingly  different  types  of  tasks  that  are,  nonetheless, 
coperformed.  A  more  stringent  validation  effort  is  currently  underway  io  assess  this 
possibility. 
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INTRODUCTION 


One  of  the  most  straightforward  ways  to  categorize  information  is 
simply  to  have  people  sort  cards  into  piles.  A  single  instance  of  the 
categories  to  be  formed  is  printed  on  each  card,  and  the  piles  that 
result  constitute  the  categories.  This  procedure  has  a  long  history  in 
psychology  for  research  in  cognitive  modeling,  and  seemed  suitable  for 
use  in  forming  Task  Training  Modules  (TTKs)  in  support  of  the  Training 
Decisions  System  ( TDS )  research  (Garcia,  1985). 

The  approach  was  pilot  tested  at  Scott  APB,  with  8 T 1 XX  personnel 
flaw  enforcement;  job  incumbents  serving  as  subjects,  or  Subject  Matter 
Experts  (SMEs).  Cards  were  labeled  with  individual  tasks  from  the  most 
recent  Occupational  Survey  (OS)  on  the  8 1 1 XX  AFS.  Cards  were  initially 
grouped  into  "starter  piles",  in  which  all  the  tasks  shared  a  common 
Specialty  Training  Standard  (STS)  paragraph  reference.  Starter  piles 
provided  SMEs  with  initial  working  units  of  manageable  size,  and  also 
with  a  reasonable  conceptual  base  (the  STS  paragraphs)  from  which  to 
start. 

SMEs  were  instructed  to  simply  rearrange  the  starter  piles  to  form 
groups  of  tasks  that  "should  be  trained  together".  Piles  could  contain 
many  tasks,  few  tasks,  or  only  a  single  task.  Duplicate  task  cards 
could  be  placed  in  different  piles.  These  rather  flexible  instructions 
and  liberal  sorting  options  were  introduced  in  order  to  allow  maximum 
expression  of  SME  opinion  and  avoid  forcing  upon  them  any  of  our  own 
notions  about  how  TTMs  should  be  formed. 

There  were  two  "passes"  for  each  within-job  sort.  That  is,  SMEs 
initially  rearranged  the  STS  piles  into  their  own  piles.  They  then  went 
through  their  own  piles  to  make  any  needed  changes.  They  worked  at 
their  own  pace  with  no  time  constraints. 

The  results  of  this  pilot  test  are  reported  elsewhere  (Vaughan, 
Yadrick,  Dunteman,  and  Clark,  1984)  and  need  not  be  examined  here  in 
detail.  Briefly,  the  resulting  TTMs  seemed  like  reasonable  task 
groupings,  and  the  overall  card-sorting  approach  was  feasible  and 
deserved  field  testing.  SMEs  found  it  easy  to  work  with  the  starter 
piles,  performed  conscientiously,  and  were  satisfied  with  their  own 
results. 
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In  the  present  effort,  the  card-sorting  method  was  field  tested  for 
2  AFSs.  The  objectives  of  this  project  were  to  develop  an  operational 
s>ster  for  TTM  construction,  and  to  get  task  clusters  that  could  be 
compared  to  computer-generated  clusters  (Perrin,  Vaughan,  Yadrick,  and 
Mitchell,  1985). 


METHOD  AND  PROCEDURE 


The  card  sorting  process  was  modified  and  expanded  for  field 
testing  punpcses.  Some  of  the  modifications  reflected  policy  decisions 
or,  our  part  (e.g.,  we  chided  not  to  have  separate  card  sorts  for  each 
job  in  earn  specialty,  but  rather  to  have  whole-AFS  sorts),  but  most 
were  minor  changes  designed  to  streamline  the  process. 

The  Security  Police  and  Law  Enforcement  (811 XX )  card  sorting  was 
conducted  at  Lackland  AFB,  Texas,  in  May,  1985.  The  fourteen  SMEs 
present  represented  a  relatively  even  mix  of  the  three  8 1 1 XX  shredouts, 
namely  security  police,  law  enforcement  personnel,  and  military  working 
dog  (MWD)  handlers.  Technical  trainers  from  ATC  were  represented  in 
fairly  equal  proportion  with  operational  oersonnel. 

The  SMEs  were  divided  into  four  separate  groups.  These  groups 
worked  independently  of  the  other  groups,  except  during  the  final  stage 
of  the  process.  This  stage  will  be  described  later.  Again  the  groups 
were  formed  to  provide  the  most  even  mix  of  SME  backgrounds,  shreds, 
etc.,  as  possible.  SMEs  then  received  instructions  and  began  sorting. 

Two  groups  received  starter  piles  in  which  all  tasks  shared  a 
common  STS  reference,  as  in  the  pilot  study.  The  ether  two  groups 
received  starter  piles  composed  of  tasks  which  had  clustered  together  in 
the  coperformance  clustering  process  (Perrin,  et  a  1 . ,  1985).  The 
resulting  TTMs  could  then  be  compared  and  evaluated.  In  addition,  this 
would  help  mitigate  any  effect  of  starter  pile  content  upon  the  final 
TTMs. 


SMEs  made  three  passes,  all  at  their  own  pace.  On  the  first  pass, 
they  oriented  themselves  to  the  exercise,  examining  piles  for 
homogeneity  of  coperformance,  skills  and  knowledges,  combining  and 
subdividing  the  piles  as  necessary.  On  the  second  pass,  the  newly 
created  piles  were  recombined  to  refine  the  coperformance,  knowledge  and 
sk’ 11  groupings.  SMEs  made  further  checks  and  refinements  ("fine 
tuning")  on  the  final  pass.  The  results  (TTMs)  of  each  pass  were 
recorded . 

The  four  groups  finished  at  very  different  times,  as  might  have 
been  expected.  Two  groups  finished  about  halfway  through  the  second 
day.  Various  groups  dynamics  were  clearly  observable,  such  as  the 
domination  of  one  group  and  of  the  sorting  results  by  a  single  forceful 
member  of  the  group. 

Different  groups  worked  as  though  they  had  quite  different 
interpretations  of  the  instructions.  For  example,  one  group  arranged 
piles  so  as  to  train  what  they  came  to  call  a  "super  cop".  They 
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carefully  arranged  sequences  of  TTHs  so  that,  by  receiving  training  on 
each  full  TTM  ir  their  prescribed  sequence,  the  result  would  be  an 
airman  whc  could  perform  literally  all  the  tasks  and  jobs  ir.  all  the 
shreds  of  the  specialty.  Host  groups,  however,  adopted  very  different 
and  mere  realistic  strategies. 

The  fina1  phase  of  the  process  was  conducted  to  reconcile  the 
different  sortings  between  groups.  In  each  reconciliation,  groups  which 
had  started  with  the  same  piles  (e.g.,  the  groups  with  STS  starter 
piles)  were  teamed  together. 

Both  reconciliation  groups  were  each  to  provide  a  single  set  of 
"final"  TTHs.  Unfortunately,  only  one  reconciliation  group  was  actually 
able  to  finish.  The  other  reconciliation  group,  made  up  of  the  two 
original  groups  which  did  not  finish  their  own  sortings  in  the  first  two 
days,  were  forced  to  rush  through  the  reconciliation  step. 

The  same  essential  process  was  carried  out  at  Keesler  AFB,  with 
Avionic  Inertial  and  Radar  Navigation  Systems  (328X4)  SMEs  in  August, 
1985.  There  were,  however,  important  differences.  All  of  the  SMEs 
available  were  instructors  at  the  technical  training  school,  and  no 
operational  people  were  present.  Also,  there  were  only  enough  SMEs  to 
form  two  groups  instead  of  four.  As  a  result,  the  desired  replication 
could  not  be  done.  Operational  workers  reviewed  and  refined  the  TTMs 
obtained  at  Keesler,  although  these  refinements  have  not  yet  been 
examined  in  detail.  The  two  328X4  groups  received  essentially  the  same 
instructions  a  had  the  8 T 1 XX  groups,  although  we  stressed  somewhat  more 
the  idea  that  TTMs  should  reflect  common  skills  and  knowledges  required 
to  do  particular  jobs.  This  change  was  made  mainly  to  avoid  any 
misinterpretation  of  instructions. 


RESULTS 


In  the  8 11 XX  AFS,  the  reconciliation  effort  resulted  in  69  TTMs 
from  the  groups  that  had  coperformance  cluster  starter  piles.  Of  these, 
only  four  TTMs  contained  a  single  task,  and  52  contained  five  or  more 
tasks.  The  largest  contained  42  tasks.  These  were  sorted  from  19 
starter  piles  containing  a  total  of  666  tasks. 

The  other  reconciliation  effort  (groups  with  the  STS  starter  piles) 
resulted  in  65  TTMs  formed  from  39  started  piles.  Only  one  contained  a 
single  task.  The  largest  contained  31  tasks. 

Although  these  results  appear  to  be  in  reasonably  good  agreement, 
the  similarities  are  only  superficial.  As  reported  in  another  paper  in 
this  session  (Perrin,  et  al.  1985),  both  the  Jaccard  index  (Downton  & 
Brennan,  1980)  and  that  from  Fowlkes  and  Mallows  (1983)  were  relatively 
low  (0.171  and  0.293,  respectively)  showing  relatively  poor  agreement 
between  the  two  groups  in  assigning  tasks  to  TTMs.  Indeed,  both  indices 
were  higher  (0.179  and  0.350,  respect i vely )  and  showed  greater  agreement 
between  the  STS  starter  pile  reconciliation  and  the  pure  coperformance 
clustering  than  between  t'  two  card  sorting  reconciliation  groups. 
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In  the  328X4  AFS,  the  group  which  started  with  STS  piles  formed  140 
TTMs  from  31  original  piles  containing  778  tasks.  For  the  most  part, 
these  were  small.  Eighty-six  contained  fewer  than  5  tasks,  and  many  of 
these  contained  only  one  task.  However,  two  were  especially  large, 
containing  78  and  48  tasks,  respectively. 

The  group  that  started  with  coperformance  piles  formed  33  TTMs  from 
58  original  piles.  Eleven  of  these  contained  over  thirty  tasks,  and  two 
contained  over  eighty  tasks. 

Although  there  appears  to  be  considerable  disagreement  between  the 
groups,  it  is  apparently  superficial.  Table  1  reports  Jaccard  and 
Fowlkes-Mal lows  indices  for  several  comparisons  of  these  card  sorts 


Table  1:  Comparison  of  SHE  card  sorts  and  task  coperformance  statistical  clustering 
in  the  328X4  AFS* 


Card-Sort  From 

Card-Sort  From 

Task 

Reconci 1 iaticn 

Coperformance 

STS 

Coperformance 

Card-Sort 

Starter  Piles 

Starter  Piles 

Clusters 

Reconciliation  - 

.326 

.637 

.127 

Card-Sort 

.520 

.787 

.244 

Card-Sort  From 

- - 

.271 

.087 

Coperformance  Starter 

Pi  les 

.476 

.171 

Card-Sort  From  STS 

.121 

Starter  Piles 

.224 

Task  Coperformance 
Clusters 


*For  each  table  entry,  the  Jaccard  statistic  is  on  top,  and  the  Fowlkes- 
Mallows  statistic  is  on  the  bottom. 


with  each  other,  with  the  later  reconciliation  sort  done  between  the  two 
groups,  and  with  coperformance  clustering.  In  general,  the  two  groups 
actually  agreed  relatively  well.  The  obvious  conclusion  is  that  the  140 
TTMs  formed  by  the  one  group  were  generally  subgroupings  of  the  33  TTMs 
formed  by  the  other.  Indeed,  the  reconciliation  data  indicate  fairly 
good  agreement  with  both  groups  (see  Table  1).  Also,  reconciliation 


resulted  in  75  TTMs,  roughly  halfway  between  the  numbers  of  TTMs  pro¬ 
duced  by  the  groups  separately.  It  would  seen  that  they  found  com¬ 
promise  easy  to  make.  Their  TTMs,  however,  do  not  agree  very  well  with 
the  computer-generated  coperformance  task  clusters.  As  Table  1  shows, 
both  the  Jaccard  and  Fowlkes-Mallows  statistics  were  less  than  .25  for 
all  such  comparisons. 


DISCUSSION 


The  results  reported  here  are  tentative.  Final  TTMs  have  not  yet 
been  produced  for  either  AFS  studied.  The  328X4  TTMs  must  be  replicated 
with  add. tier, a  1  field  data.  Nonetheless,  even  at  this  point  it  appears 
that  neither  the  clusters  derived  from  task  coperformance  clustering  nor 
those  constructed  by  a  single  group  of  SMEs  would  yield  stable  TTMs. 
Instead,  some  combination  of  methods  will  be  required  that  allows 
tentative  TTMs  to  be  crossvalidated  and  refined  by  representative  groups 
of  field  experts.  Once  a  method  is  developed  that  produces  final, 
stable  TTMs  for  each  specialty,  these  TTMs  can  be  used  as  criteria 
against  which  more  economical  clustering  methods  can  be  validated. 
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INTRODUCT ION 

One  of  the  problems  with  the  typical  occupational  analysis  project  is  the 

limitation  of  coverage  of  job  groups  in  the  normal  Comprehensive  Occupational  Data 
Analysis  Programs  (CODAP)  hierarchical  clustering  solution.  In  an  operational 
study,  the  groups  of  minimally  acceptable  size  existing  at  any  stage  of  the 
clustering,  as  reflected  on  the  diagram  of  the  occupation,  will  not  include  all 
the  cases,  except  at  a  very  low  stage,  where  between-group  overlap  values  are  also 
very  low  (see  Figure  l).  This  is  a  function  of  both  the  degree  of  homogeneity  of 
jobs  within  the  specialty  and  the  order  m  which  the  cases  enter  the  initial 

groups.  It  is  not  unusual  to  have  5  to  20  percent  of  the  cases  in  a  study 
excluded  from  the  final  job  groups  identified  in  an  occupational  survey  report 
(OSR) . 

While  the  identification  of  the  job  groups  in  an  occupation  or  Air  Force 
Specialty  (AFS)  has  considerable  utility  in  the  personnel  classification  and 

training  system,  the  final  job  descriptions  for  such  groups  have  some  inherent 
limitations  (Carpenter,  1 9  7  A ;  Pass  &  Robertson,  1978).  The  hierarchical 
clustering  procedure  normally  used  is  an  iterative  process  whereby,  once  a  case  is 
clustered  in  a  group,  it  is  no  longer  considered  in  terms  of  its  similarity  of 
tasks  performed  or  relative  time  spent  performing  against  later-formed  groups. 

Some  researchers  have  proposed  that  other  clusteiing  methodologies  or  other 
similarity  measures  be  used  (Pass,  1980).  No  one  approach,  however,  has  emerged 
as  an  optimum  clustering  alternative. 

In  the  present  study,  a  nonhierarchical  clustering  methodology  was  used  to 
refine  the  job  types  identified  in  the  normal  CODAP  process.  The  usual 
hierarchical  clustering  procedure  was  used  tc  identify  the  "seed"  groups  and  an 
iterative,  nonhierarchical  clustering  method  was  added  to  refine  these  groups. 
The  expectation  was  that  a  combination  of  the  two  methods  would  provide  superior 
results  in  terms  of  group  membership,  percent  of  cases  accounted  for,  and  realism 
of  the  resulting  job  descriptions. 

METHODOLOGY 

A  runstream  of  existing  CODAP  programs  has  been  put  together  to  give  the 
CODAP  system  a  nonhierarchical  capability  (Datko,  1985;  Phalen,  Weissmuller,  & 
Staley,  1985).  This  new  procedure  was  developed  to  facilitate  analysis  of  various 
types  of  officer  job  rating  scales  (relative  time  spent  versus  part  of  the  job, 
complexity,  etc.),  and  has  already  proven  very  useful  in  that  context. 

If  a  set  of  CODAP-generated  job  descriptions  is  input  to  this  nonhierarchical 
clustering  program  as  "seed"  profiles,  an  unlimited  number  of  cases,  as 
represented  by  their  individual  job  descriptions,  can  be  classified  into 
homogeneous  groups,  according  to  which  one  of  the  "seed"  profiles  each  case 
resembles  most  closely.  Multiple  iterations  can  be  run  to  ensure  that  each  case 
has  the  opportunity  to  switch  to  a  more  compatible  group,  or  to  switch  to  a  group 
whose  homogeneity  is  increased  the  most  by  classifying  the  case  in  it,  until  the 
number  of  reclassifications  nas  reached  a  minimum.  The  reclassification  of  cases, 
previously  grouped  by  a  hierarchical  clustering,  is  a  way  to  improve  within-group 
homogeneity  and  between-group  d  i  f ' -’rences  iPhalen  et  al.,  1985).  Also  possible  in 
the  system  is  multiple  group  memoership  bv  a  use  or  group  of  cases. 
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While  che  membership  of  the  nonnierarchical  Iv  clustered  job  types  will 
overlap  to  a  verv  substantial  degr  e  with  those  produced  by  che  normal  CODAP 
clustering,  this  new  procedure  permits  the  "isolates"  or  unclustered  cases  to  be 
included  in  Che  covered  sample.  In  addition,  the  resulting  job  descriptions 
should  be  considerably  more  homogeneous  in  terms  ol  both  the  "core"  tasks  of  Che 
group  and  the  relative  time  spent  on  those  tasks.  Thus,  Che  relative  worth  of  the 
new  procedure  could  be  assessed  by  examining  tne  increase  in  the  homogeneity  of 
the  groups.  Such  a  procedure,  then,  should  result  in  a  clearer  definition  of  Che 
variety  of  jobs  within  a  specialty,  and  a  more  realistic  determination  of  the 
training  requirements  for  each  such  job. 

SAMPLE 

CODAP  case  data  were  readily  available  for  a  sample  of  first-enlistment 
Security  Policemen  (AFSC  811X0)  who  were  being  studied  in  another  line  of  research 
(Perrin,  Vaughan,  Yadrick,  &  Mitchell,  1985).  The  sample  included  3,302 
f irs t-enl is tment  individuals  from  the  most  recent  Security  Police  occupational 
analysis  study  (Alton,  1984).  A  separate  first-enlistment  diagram  was  analyzed 
and  82  8 1 1 XX  first-term  seed  job  groups  were  identified  which  had  reasonable 

internal  homogeneity  (overlap  between  combining  groups  ss  35  and  overlap  within 
combined  groups  ~50).  Group  size  ranged  from  3  to  210;  the  average  group  size 
was  27.  Some  very  small  groups  were  included  which  were  less  than  the  starter 
group  size  of  10;  these  smaller  groups  were  identified  by  closely  examining 
several  small,  undefined  heterogeneous  groups  with  mixed  job  titles  and  low 
within-grcup  overlap  values  (such  as  Customs  or  Kennel  Support).  The  reason  for 
including  these  very  small  groups  was  to  see  if  they  would  gain  sufficient 
additional  members  in  the  nonhierarchical  process  to  become  legitimate  job  groups. 

The  82  groups  accounted  for  only  67  percent  of  the  cases  in  the  sample,  which 
illustrates  the  coverage  proDiem  discussed  earlier.  Each  group  was  named  and 
related  groups  were  given  overlapping  component  names  to  facilitate  analysis  of 
their  relationships. 

RESULTS 

The  initial  nonhierarchical  grouping  yielded  a  marked  reduction  in  the  number 
of  cases  not  classified  into  che  82  seed  job  types.  Only  five  cases  remained 
unclassified  for  a  99.8  percent  coverage  (see  Figure  2).  In  succeeding 
iterations,  the  number  of  unclassified  cases  increased  slightly,  but  even  at  the 
sixth  iteration,  the  proportion  of  cases  covered  was  greater  than  99  percent. 
Thus,  it  would  appear  that  almost  all  of  the  inaccounted-f or  cases  in  the  original 
clustering  were  reasonably  similar  to  the  major  job  types  identified  in  the 
initial  analysis.  The  question  now  oecame  one  of  assessing  the  impact  of  adding 
these  previously  unaccounted-for  cases  to  the  groups. 

One  way  to  address  this  issue  was  to  examine  the  within-group  overlap  values, 
averaged  across  all  82  groups.  For  the  initial  iteration  of  the  nonhierarchical 
clustering  process,  the  average  within-group  overlap  w3S  48.23.  The  within-group 
standard  deviation  averaged  across  all  groups  was  10.68.  For  the  second  run,  the 
mean  within-group  overlap  value  was  48.65  (vice  48.23)  with  an  average  S.D.  of 
8.81  (vice  10.48).  Thus,  this  second  iteration  had  little  if  any  impact  on  the 
average  within-group  overlap  but  resulted  m  a  substantial  decrease  of  the 
within-group  variance.  This  change  in  values  indicated  that  the  new  job  group 
descriptions  were  considerably  more  homogeneous.  This  is,  however,  a  summary 
statistic  and,  in  order  to  demonstrate  a  meaningful  impact  on  the  job-typing 
process,  we  must  examine  how  the  addition  of  the  unclassified  cases  changed 
individual  job  groups.  Some  representative  results  were  selected  to  illustrate 
several  observed  trends. 
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Three  related  Law  Enforcement  (LEI  jobs  identitied  in  the  original  hier¬ 
archical  clustering  study  included:  LE  Desk  Sergeants  (Grp  394),  LE  Patrolmen 
CCrp  735  ),  3nd  a  group  which  performs  both  as  Desk  Sergeants  and  Patrolmen  (Grp 
921).  In  the  original  study,  grouD  size  tor  these  groups  was  15,  12,  and  90, 
respectively  (see  Figure  3).  In  the  initial  iteration  of  the  nonhierarchical 
clustering  process,  both  the  Desk  Sergeant  and  Patrolmen  groups  more  than  doubled 
in  size,  whereas  the  mixed  group  dropped  in  membership.  it  is  reasonable  to 
assume  that  the  22  members  lost  from  the  combined  LE  Desk  Sgt/Patrol  group  were 
those  performing  more  Desk  Sergeant  functions,  since  22  new  members  appeared  in 
the  Desk  Sergeant  group  (Grp  594).  Note  that  once  these  z2  new  cases  were  added, 
the  Desx  Sergeant  group  membership  essentially  stabi 1 ized--the  number  of  cases, 
the  mean  within-group  overlap,  and  tne  standard  deviation  remained  about  the  same 
for  iterations  1  through  6.  Note  also  that  the  standard  deviation  for  this  larger 
group  (N  =  37)  dropped  considerably  (from  9.2  to  7.3),  which  demonstrates  the 
development  of  a  more  homogeneous  group. 

The  composite  LE  Desk  Sgt. 'Patrolmen  group  (Grp  921)  attntted  more  of  its 
membership  with  each  iteration  of  the  process.  Its  mean  within-group  overlap 
dropped  slightly  at  each  stage  and  its  SD  increased,  indicating  that  the  remaining 
cases  were  more  heterogeneous  man  in  the  original  seed  group.  Presumably,  if 
additional  iterations  had  been  run,  this  group  would  essentially  disappear. 

The  LE  Patrolmen  (Grp  785),  on  the  other  hand,  first  grew  in  membership  and 
then  shrank,  while  its  mean  within-group  overlap  steadily  increased  and  its  SD 
dropped.  The  group  gained  in  membership  from  other  groups  not  included  in  this 
comparison  (there  is  a  composite  LE  Patrolmen/Entry  Controller  group,  and  other 
possible  contributors).  The  point  is  that  even  at  the  sixth  iteration,  this  group 
had  not  completely  stabilized,  and  additional  runs  might  be  needed  to  maximize  the 
within-  group  overlap  and  minimize  the  within-group  variance. 

A  set  of  three  related  Alert  Area  Security  groups  illustrates  some  additional 
trends  (see  Figure  4).  All  three  groups  involve  tasks  being  performed  by  Security 
Policemen  in  controlling  access  to  and  guarding  alert  aircraft.  Grp  283  appears 
to  be  the  most  meaningful  of  the  three  groups,  in  that  its  membership  continued  to 
expand  through  all  six  iterations,  whereas  the  other  two  groups  fluctuated.  We 
might  need  to  run  additional  iterations  to  see  how  large  this  group  would  become 
before  the  mean  overlap  is  maximized  and  the  S.D.  is  minimized.  For  the  other  two 
groups,  we  need  to  examine  their  job  descriptions  to  determine  how  they  differ 
from  Grp  283  (that  is,  what  makes  them  distinct  groups).  Only  then  can  a  judgment 
be  made  whether  these  two  small,  low-overlap  groups  should  be  retained. 

DISCL'SMON 

Preliminary  results  stronglv  suggest  that  the  n  innierarchica  l  clu  »ng 
process  has  considerable  potential  in  terming  tne  existing  occupational  analysis 
process.  Tne  drastic  drop  in  the  number  of  unclustered  or  unaccounted-for  cases 
represents  the  greatest  benefit,  since  the  group  job  descriptions  resulting  from 
increased  sample  sire  should  be  much  more  stable.  We  need,  however,  to  extend  our 
analysis  to  i  nc '  le  studv  of  such  variables  is  the  core  tasks  for  each  group  and 
che  amount  of  core  job  time  accounted  for  bv  suen  core  tasks.  With  lower 
withm-vroup  variance  and  somewhat  increased  within-group  overlap,  the  set  of  core 
tasks  should  account  for  a  s i gni 1 1 cant  l  v  greater  -roportion  of  total  work  time. 
Such  a  finding  would  have  important  implications  for  the  identification  of 
training  requirements  within  a  specialtv,  both  for  resident  programs  and 
on-the-job  training  programs.  In  addition,  the  nore  discretelv  defined  job  groups 
resulting  trum  this  tvpe  ot  analysis  should  have  consideraolv  more  construct  and 
content  validity  in  tne  eves  ot  other  occupational  analysis  users,  such  as 
functional  managers  m  the  manpower  ini  persou'el  c  ornmun  1 1  les . 


Clearlv,  further  work  needs  to  be  done  to  define  the  proper  techniques  for 
using  this  Cvpe  of  system.  Data  displays  snculd  be  developed  which  will  permit  an 
analvsc  to  track  what  is  happening  to  each  group  across  iterations.  Aiso  needed 
is  a  method  of  setting  aside  stabile  groups  at  the  end  of  eacn  iteration  and  then 
continuing  to  run  the  program  on  the  remaining  groups.  The  analysis  process  will 
have  to  be  sufficiently  flexible  to  permit  multiple  runs  of  the  later  iterations 
in  order  to  optimize  both  the  mean  grouo  overlap  and  standard  deviation  values. 
Such  developments  are  planned  over  the  n -xt  few  months. 

The  system  could  also  be  used  to  cluster  a  transposed  task-by-person  matrix 
(Perrin  et  a  1 .  ,  op  cit)  to  refine  Task  Training  Modules  (TTMs)  from  coperformance 
data.  Such  an  application  would  have  considerable  advantages  in  terms  of  reduced 
cost  and  multiple  TTM  membersnip  for  technical  tasks  that  apply  to  more  than  one 
module . 
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Figure  1.  Clustering  of  Security  Police  Time-Spent  Ratings  -  811XX 
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On  the  Study  of  Differential  Item  Performance  without  IRT 


Paul  W.  Holland 
Educational  Testing  Service 
Princeton,  New  Jersey  08541 


1 .  INTRODUCTION 


The  problem  of  identifying  items  for  which  the  performance  of  certain  sub¬ 
populations  —  often  women  and  minorities  —  is  unusual  and  out  of  line  with 
their  performance  on  other  items  or  test  results  has  a  substantial  history.  The 
book  by  Berk  (1982)  summarizes  the  state  of  the  art  as  of  1980  and  the  work  of 
Lord  (1980),  Scheuneman  (1979),  Shepard,  et.al.  (1981),  among  others  are 
relevant.  From  a  statistical  point  of  view,  modern  methods  for  the  analysis  of 
multi-way  contingency  tables  seem  particularly  appropriate  to  this  problem  and 
some  suggestions  for  their  use  have  been  made,  (Marascuillo  and  Slaughter, 

1981).  In  this  spirit,  the  present  paper  proposes  the  well-known  method  of 
Mantel  and  Haenszel  (1959)  for  the  analysis  of  2*2*K  contingency  tables  as  an 
easily  implemented,  powerful  technique  for  the  measurement  of  the  degree  to 
which  two  subpopuiations  of  examinees  perform  differently  on  a  given  test  item. 
Modern  references  to  the  Mantel-Haensze 1  procedure  include  Breslow  (1981),  Hauck 
(1979),  and  Breslow  and  Liang  (1982).  The  basis  for  the  use  of  the  Mantel- 
Haenszel  (herein  MH)  procedure  in  the  study  of  differential  item  performance  is 
the  fundamental  notion  of  the  need  to  compare  comparable  people  when  examining 
the  relative  performance  of  two  groups  of  examinees  on  an  item.  This  ic  the 
problem  of  matching  and  is  discussed  in  section  2.  Section  3  gives  the  relevant 
facts  about  the  MH  procedure  while  section  4  discusses  various  aspects  of  the  MH 
procedure  and  related  methods  in  the  context  of  measuring  differential  item  per¬ 
formance  . 


2.  MATCHING  VERSUS  CONTROLLING  FOR  ABILITY 

The  need  to  "control  for  ability"  is  well  established  in  the  differential 
item  performance  literature.  It  is  the  fundamental  basis  for  the  proposed  use 
of  item  response  theory  methods  to  study  "item  bias."  Other  methods,  such  as 
those  of  Scheuneman  (1979),  use  test  performance  as  a  proxy  for  ability.  The 
"delta-plot  method,"  Angoff  (1982),  controls  for  ability  indirectly  by  con¬ 
centrating  attention  on  the  covariance  between  the  item  difficulty  indices  for 
the  two  groups  rather  than  on  their  respective  mean  values. 

In  my  opinion,  the  "need  to  control  for  ability"  is  an  inadequate  way  to 
express  a  more  fundamental  idea.  When  we  compare  two  subpopulations  on  any 
criterion,  it  is  always  important  to  be  sure  that  only  comparable  members  of  the 
two  groups  are  being  compared.  What  constitutes  comparability  will  depend  on 
the  problem  at  hand.  In  the  study  of  differential  item  performance  we  are 
interested  in  learning  something  about  a  test  item  and  how  members  of  one 
subgroup  (the  "focal  group")  might  react  differently  to  it  than  do  the  members 
of  another  subgroup  (the  "reference  group").  If  our  criterion  is  performance 
(i.e.,  right  or  wrong  on  the  test  item)  then  it  is  improper  to  compare  the  per¬ 
formance  of  reference  and  focal  group  members  who  differ  in  significant  and 


measurable  ways  that  are  related  to  their  performance  on  the  item.  Differential 
item  performance  means  differences  in  performance  on  an  item  between  focal  and 
reference  group  members  that  is  attributable  to  characteristics  of  the  item  and 
not  to  differences  m  characteristics  of  the  groups  of  examinees. 

When  we  confound  both  examinee  characteristics  and  item  characteristics  and 
simply  look  at  differences  in  the  performance  on  an  item  of  reference  and  focal 
group  members  we  are  measuring  what  is  called  impact  rather  than  differential 
item  performance.  For  example,  comparing  the  proportion  of  reference  and  focal 
group  members  who  give  correct  answers  to  a  given  item  is  a  measure  of  the 
item's  impact  on  the  focal  group  relative  to  the  reference  group  In  measuring 
differential  item  performance  members  of  the  reference  and  focal  groups  are 
first  divided  into  sets  of  examinees  who  are  matched  on  relevant  criteria  before 
their  performance  on  the  item  is  compared.  Examples  of  relevant  matching 
criteria  are-  scores  on  related  tests,  schooling  measures,  and  other  group  mem¬ 
bership.  In  many  practical  settings,  matching  will  be  done  on  related  test 
scores  since  these  are  both  available  and  accurately  measured. 

The  2*2*K  Table-  For  a  given  item,  say  item  j,  the  data  from  the  i1-^1  matched 
group  of  reference  and  focal  group  members  can  be  arranged  as  a  2^2  table: 

i\j.ght  on  Wrong  on 


(1) 


For  i=l,...,  K  =  number  of  matched  groups.  In  (i)  Aj^  denotes  the  number  of 
reference  group  members  in  the  i™  matched  group  who  answered  item  j  correctly. 
B-^,  and  h^ve  corresponding  interpretations,  np^  and  npp  denote  the 

number  of  reference  and  focal  group  members,  respectively,  in  the  i^“  matched 
group,  while  n+^  denotes  the  total  number  in  the  i^'  matched  group  of  examinees. 
Rx  and  Wj  denote  the  number  in  the  ith  matched  group  who  get  the  item  right  and 
wrong,  respectively.  Considered  together  these  K  2*2  tables  form  one  big  2*2*K 
table.  There  is  one  such  2x2xK  table  for  each  item  being  considered.  It  is 
worth  emohasizing  that  once  the  criteria  for  matching  have  been  selected,  the 
2*2*K  table  of  data  can  be  formed  from  samples  of  data  from  the  reference  and 
focal  group  members.  It  should  also  be  emphasized  that  the  choice  of  matching 
variables  is  important  and  will  depend  on  the  availablity,  amount,  and  accuracy 
of  data  as  well  as  on  its  relevance  to  item  performance. 


3.  THE  MANTEL-HAENSZEL  PROCEDURE 


In  the  matched  group,  the  odds  that  a  reference  group  member  gets  item 
j  correct  is  while  the  corresponding  odds  for  a  focal  group  member  is 

Cf/Di.  The  MH  procedure  measures  the  advantage  (or  disadvantage)  on  item  j  that 
reference  group  members  have  relative  to  their  matched  focal  group  colleagues  by 
the  ratio  of  these  two  odds.  This  gives  us  the  odds-ratio  estimate 

*  A,-  C,-  Ai  D,- 

«i  =  -±  /  _i  =  — — i  •  (2) 

Bi  '  Di  Bi  Ci 

The  a ^  estimate  a  population  cross-product-(or  odds-)  ratio,  for  the  i^h 
matched  group. 

The  Mantel-Haenszel  common-odds-ratio  estimate  is  a  weighted  average  of  the 
ax  that  uses  the  following  weighted  formula: 

1  w,-  a,- 

rr\/TT  =  1  J. 


where 


Bf  C, 

Ui  ~  ~  ‘  (4) 

Substituting  (4)  into  (3)  yields  the  usual  formula  for  a^u: 

_  l  Ai  Di/n+i 

aMH  =  - 1  1  1  •  m 

l  Bi  Ci/n+i  ^ 

The  Mantel-Haenszel  estimate,  a^,  is  the  average  factor  by  which  the  likelihood 
that  a  reference  group  member  gets  item  j  correct  exceeds  the  corresponding 
likelihood  for  comparable  focal  group  members.  (Likelihood  is  measured  by  the 

odds  of  getting  item  j  correct).  For  example,  if  =  1  then  reference  and 
focal  group  members  are,  averaging  across  all  the  matched  groups,  equally  likely 


to  be  correct  on  the  item.  When  >  1  then  the  reference  group  has  the  advan¬ 
tage  whereas  when  <  1  the  focal  group  has  the  advantage. 

Associated  with  the  estimate  is  a  one-degree-of-f reedom  chi-square  test 
of  the  hypothesis  that  all  of  the  population  cross-product  ratios  in  all  of  the 
2x2  layers  of  the  2x2xK  table  are  unity  (i.e.,  ai  =  1  all  i).  This  test  is 
given  by  the  formula: 

(I?  Ai  -  l  Hi |  - 

l  i 
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where 


Hi  *  E(AiK=l>  = 


(7) 


and 


=  Var(Ai|ai=l) 


nRi  nFi  Ri  wi 
(n+i)2(n+i-l) 


(8) 


The  X^  from  (6)  will  be  large  if  differs  from  1.0  significantly  in  either 


direction.  Thus,  this  test  will  detect  differential  item  performance  that 
favors  either  the  reference  or  the  focal  group. 


4.  DISCUSSION 

The  MH  procedure  is  closely  related  to  log-linear  model  procedures  for 
estimating  a  constant  two-way  interaction  across  a  series  of  2*2  tables  (see 

Bishop,  Fienberg,  and  Holland,  1975).  In  practical  terms,  <xvjh  is  usually  nearly 
identical  to  estimates  of  the  common-odds-ratio  that  involve  complicated  itera¬ 
tive  procedures.  While  the  formula  foi  is  a  simple  weighted  average  of  the 
sample  odds-ratio  ,  it  has  been  shown  (Breslow,  1981)  that,  over  the  range  of 

values  relevant  to  this  application  of  the  MH  procedure,  is  nearly  optimal 

as  an  estimator.  In  other  words,  no  other  estimate  of  the  common-odds-ratio  can 

have  a  substantially  smaller  variance.  The  chi-square  test  based  on  X*  is  of 

MH 

high  power  because  it  is  concentrated  into  a  single  degree  of  freedom  rather 
than  dissipated  across  several  degrees  of  freedom. 

If  there  is  more  than  one  pair  of  groups  that  could  serve  as  the  reference 

and  focal  group  in  an  analysis  then  values  for  and  Xj^  can  be  computed  for 

all  such  pairings . 

The  parameter  A  =  -2.35  ln(a)  is  (approximately)  in  the  scale  of  differen¬ 
ces  in  delta-units  of  difficulty  where  delta-units  are  those  used  by  ETS  in 
their  normal  item  analysis  procedures.  This  transformation  can  be  used  to  put 

aMH  values  into  units  which  are  familiar  to  those  who  use  the  delta-scale  in 
test  construction  and  analysis: 

AMr  =  “2.35  ln(aMn) .  (9) 

Thus,  Ajvjpi  =  -1.0  means  that  the  focal  group  found  the  item  one  delta-unit  harder 
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than  did  comparable  members  of  the  reference 
lar  to  an  average  shift  to  the  right  of  -A^h 
tive  to  the  ICC  of  the  reference  group  in  .n 
LOGIST). 


group.  The  parameter  Ajjj|  is  simi- 
m  the  ICC  of  the  focal  group  rela- 
IRT  model  (as  estimated  by 


The  MU  procedure  can  easily  be  expanded  to  include  an  analysis  of  distrac- 
tor  choice  for  multiple  choice  tests.  For  five-choice  responses  the  2x2  table 
in  (1)  is  replaced  by  the  followin  2x6  table 


Response  on  item  j 


A 

B 

C 

D 

E 

Omit 

Total 

Reference 

nRi 

Foca  1 

nFi 

Total 

n+i 

C'  is  correct  answer,  for  example.  Then  the  MH  procedure  is  applied  to  the  five 
2><2  tables  formed  by  juxtaposing  the  column  for  the  correct  answer  with  a 
column  for  one  of  the  five  ways  of  producing  incorrect  answers.  E.g., 

C*  A  C"  B  C*  Omit 


Reference 


Focal 


This  yields  five  MH  cross-product  estimates  and  five  chi-square  tests  for  each 

item.  In  some  cases  these  may  be  used  to  see  if  a  significant  value  of  is 
due  to  a  single  type  of  incorrect  answer. 

There  are  a  number  of  important  research  issues  that  need  to  be  addressed 
in  the  use  of  the  MH  procedure  in  the  study  of  differential  item  performance. 

What  aspects  of  the  ciiteria  foi  matching  examinees  seriously  affects  ctjqpj  in 
practical  settings  —  the  reliability  of  the  criteria,  the  fineness  of  the 
matching,  the  use  of  other  examinee  attributes,  etc.?  How  stable  are  the  values 

of  across  different  examinee  populations?  What  are  the  relationships  bet¬ 
ween  the  values  of  a>jjj  and  other  statistical  indices  used  to  construct  tests  — 

i.e.,  difficulty  and  discrimination?  How  should  values  of  for  several  pairs 
of  reference  and  focal  groups  be  combined  for  the  same  test  item? 
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The  MH  procedure  promised  to  be  a  relatively  inexpensive  and  yet  statisti¬ 
cally  powerful  technique  for  identifying  test  questions  that  are  potentially 
"biased"  or  unfair  in  some  way  to  identified  groups  of  examinees.  ETS  is 
currently  embarked  on  a  variety  of  research  projects  to  see  how  to  best  use  this 
tool  for  such  purposes. 
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REVIEWING  AN  ITEM  POOL  FOR  ITS  SENSITIVITY  TO  THE  CONCERNS 
OF  WOMEN  AND  MINORITIES:  PROCESS  AND  OUTCOMES 


Susan  Wilson  l.ershaw 
Howard  Warner 

Educational  Testing  Service 
Princeton,  New  Jersey 


Why  a  Sensi_t i_y i_t y  R§Y=_§w 

Hn  item  pool  -for  a  computerized  adaptive  version  of  the  ASVAB 
has  been  developed  and  must  be  subjected  to  numerous  editing  and 
review  processes  before  being  made  operational.  Among  these 
processes  is  a  sensitivity  review  of  the  entire  item  pool  for 
material  that  may  be  potentially  offensive  to  minority  groups 
and/or  women.  During  a  sensitivity  review,  all  test  items  are 
screened  by  trained  sensitivity  reviewers  to  ensure  that  the  test 
is  as  free  as  possible  from  perceived  bias  and  offensiveness.  In 
order  for  each  examinee  to  perform  at  his  or  her  optimum  level  of 
ability,  it  is  necessary  to  eliminate  any  material  that  may 
convey  negative  or  distracting  messages  to  a  particular  subgroup 
of  the  test-taking  population.  This  type  of  review  is 
particularly  important  in  the  development  of  the  CAT-ASVAB  since 
the  test-taking  population  will  be  drawn  broadly  from  many 
cultural  and  soc 1 a— economi c  backgrounds  and  because  it  is 
difficult  to  forsee  what  particular  combination  of  items  will  be 
presented  to  a  given  examinee. 

The  5ensiti.yi.ty  Review  Process 

EH  Objectives 

The  overall  objective  of  the  sensitivity  review  process  at 
ETS  is  to  eliminate  any  material  from  tests  that  may  be 
potentially  offensive  or  1 nappropr 1  ate  for  identifiable  subgroups 
of  the  test-taling  population,  or  that  reinforces  negative 
attitudes  toward  these  subgroups.  The  ETS  test  e  mluation 
criteria  consist  of  a  general  set  of  criteria  that  can  be  applied 
to  all  people,  and  specific  criteria  that  are  parti  r.ul  arl  y 
relevant  for  five  subgr  oups — Asian  Americans,  Black  Americans, 
Hispanic  Americans,  Native  Americans,  and  women. 

Test  Evaluation  Critieria 

The  following  test  evaluation  criteria  have  been  excerpted 
from  ETS  Test  Sensitivity  Review  Process  (Hunter  b  Slaughter, 

1 9Bu,'  .  For  a  more  detailed  description  of  the  sensitivity  review 
process,  refer  to  their  document. 

A.  Definitions 

Group  reference  items  reflect  the  multicultural  nature  oi  our 
society-  There  are  two  basic  classes  of  subgroup  reference 
items:  representational  items  and  substantive  items. 
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1-  b§PCg5SQt^lt.1 9Q§i  items  are  those  in  which  references  are 
made  to  minorities  and  women,  but  where  the  subject 
matter  content  of  the  test  is  intended  to  measure  factors 
unrelated  to  such  groups.  Reading  passages,  charts  and  graphs, 
pictures,  cartoons,  and  writing  ability  times  are  most  easily 
adapted  to  this  purpose. 

2.  Substantive  items  are  those  designed  to  directly  measure 
Inowl  edge  about  a  population  group.  Such  items  might 
as!  about  the  role  of  the  Black  church  in  Blact  life,  the 
migration  patterns  of  Chicanos  in  North  America,  or  the  factors 
that  have  led  to  the  increasing  numbers  of  women 
enrolling  in  gr aduate-1 evel  programs. 

Evaluation  Reguirements 

h1 1  test  items  will  be  reviewed  and  identifiable  group  reference 

items  will  be  evaluated  from  the  following  perspecti ves: 

1.  Qggni  ti  ve/_Af  f  ect  i  ve 

These  two  dimensions  should  be  considered  when  reviewing  all 
group  reference  items.  The  cognitive  dimension  deals  with  the 
factual  basis  of  item  content  and  the  affective  dimension 
reflects  the  positive  or  negative  feelings  the  item  may  evoke 
on  the  part  of  group  members. 

2.  Controversial  Material, 

Highly  contr oversi al  issues,  such  as  legalized  abortion  or 
hypotheses  abc.  t  genetic  inferiority,  must  not  be  included  in 
any  test  item  unless  they  are  both  relevant  and  essential 
to  effective  measurement. 

3.  Examinee  Per sgect l ve 

All  group  reference  items  should  be  reviewed  from  the 
perspective  of  test  talers  who  do  not  have  access  to  an  answer 
ley.  When  an  examinee  must  1  now  the  correct  key  to  prevent  an 
item  from  reinforcing  negative  attitudes,  the  item  should  be 
rejected.  This  situation  most  often  arises  when  an  item 
writer  attempts  to  mislead  test  tal ers  who  hold  stereotypical 
reliefs  by  using  one  or  more  distractors  that  are  obvious 
stereotypes  (obvious  to  the  writer).  In  these  cases, 
examinees  who  select  a  sterotype  as  the  correct  response  are 
not  routinely  irtforme?d  that  their  response  was  incorrect. 

Such  practice  may  reinforce  their  belief  in  the  legitimacy  of 
the  stereotype.  This  is  particularly  111  el y  for  examinees 
who  subsequently  receive  a  high  score. 

4.  Stereotyping 

nil  ETS  tests  will  be  reviewed  to  ensure  that  they  do  not 
contain  language  or  symbols  that  reinforce  sterotypes  judged 
to  be  generally  offensive.  While  it  is  clear  that  no  ETS  test 
intentionally  contains  blatant  reference's  to  such  offensive 
stereotypes,  the  potential  exists  for  including  material  in  a 
test  that  could  be  perceived  as  mam  f  estati  ons  of  such 
sterotypes  by  group  members  who  tat  e  the  test.  Offensive 
stereotypes  generally  imply  an  inferiority  or  deficiency 
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ot  one  ot  more  groups  in  physical  char acter  1  st  1  cs  (-for 
example,  height,  weight,  attractiveness,  strength)  and 
psychol ogi cal  char acter 1 st 1 cs  i+or  example,  intelligence, 
ethics,  emotions,  behavioral  patterns)  generally  regarded 
as  desirable  by  the  majority  culture. 

5.  Cauti.cn  Words  and  Phrases 

Through  experience,  those  currently  reviewing  tests  for 
sensitivity  have  learned  that  certain  key  words  and  phrases 
are  more  liiely  to  accompany  sensitive  material.  While  for 
the  most  part  use  of  these  words  and  phrases  in  tests  are 
proper  and  legitimate,  they  will  receive  extra  attention  by 
sensitivity  reviewers  when  any  two  or  more  words  are  found  in 
an  item  because  they  indicate  an  increased  potential  for 
including  offensive  material.  Examples  of  such  words  are 
i9bf^Cr_'Ci  Cii.iCCl.E'IQit  IQOj.  r.§9§i  and  h9y§§wi_fe. 

a.  Special  Review  Criteria  for  Women Js  Concerns 

During  the  past  decade,  a  great  deal  of  progress  has  been  made 
in  identifying  numerous  manifestations  of  sexism  in  our 
society.  This  progress  has  included  several  efforts  to 
identify  and  eliminate  sexism  in  written  language.  Two 
notable  efforts  in  this  direction  were  the  ETS  Guidelines  for 
Sex  Fairness  in  Jests  and  Jesting  PCQQCams  and  the  McGraw-Hill 
Guidelines  for  EguaJ  Jreatment  of  the  Sexes.  Much  of  the 
material  contained  in  these  documents  has  been  recast  to  serve 
as  evaluation  criteria.  Sensitivity  reviewers  will  ensure 
that  all  ETS  tests  are  in  compliance  with  these  criteria. 

7.  Underlying  Assumptions 

rtn  underlying  assumption  is  a  subtle  secondary  premise  in  test 
material  that  reflects  an  individual’s  ethnocentric  beliefs. 

6.  Context  Considerations 

Reviews  for  sensitive  material  frequently  require  judgments 
relative  to  the  context  in  which  it  is  presented.  In  some 
cases,  it  may  be  necessary  to  measure  one’s  Inowledge  ot  a 
domain  by  using  material  that  some  groups  may  feel  is 
sensitive.  There  are  tour  areas  in  which  this  occurs  with 
some  frequency:  historical  domain,  literary  domain,  legal 
domain,  and  psychol ogi cal  domain. 

Formal  btructur.e  and  Procedures 

The  formal  test  sensitivity  review  is  conducted  by  trained 
sensitivity  reviewers  who  are  often  members  of  the  Test  Development 
staff  at  ETS.  The  review  procedure  involves  an  evaluation  of  the  test 
b  >'  the  sensitivity  reviewer  in  accordance  with  the  standard  test 
evaluation  criteria.  An  .--comments  and  recommendations  by  the 
sensitivity  reviewer  are  documented  on  the  Test  Sensitivity  Review 
Form  and  r  otur  ned  to  the  test  assembler.  The  test  assembler  discusses 
recommended  changes  with  the  sensitivity  reviewer.  If  agreement  is 
reached  concerning  changes,  both  sign  and  date  the  review  form.  If 
agreement  . annot  be  reached,  the  matter  is  referred  to  the  area  Test 
Development  Director  and,  if  necessary,  to  an  arbitration  committee 
f  or  r esol ut 1  on . 
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The  sensitivity  review  at  the  CAT-ASVAB  item  pool  followed  the 
standard  ETS  procedures  except  for  the  final  steps  in  which  actual 
modifications  are  made  to  unacceptable  test  items.  Rather, 
recommendations  for  changes  by  a  minority  panel  were  submitted  for 
Government  review  in  a  Confidential  Appendix  to  the  ETS  report.  The 
^?Q§Lt„iy  it  y  9i  the  CAT^ASVAB  Item  Ban!_  (Wilson  h  Warner,  19B5.  ) 

19?  &^Ir0SyAB  Sensitivity  R§yi§w  &9^Cd 

The  Sensitivity  Review  Board  for  this  project  was  composed  of 
twelve  nationally  renowned  educators  and  test  developers  representing 
minority  group  concern.  Asian  American,  Blac!  American,  Hispanic 
Americans,  and  both  gender  groups  were  represented  on  the  panel. 

The  following  ETS  criteria  were  considered  in  selecting 
sensitivity  reviewers: 

A.  Ability  to  perceive  offensive  material 

b.  Ability  to  review  tests  from  multiple  perspectives,  not 

simply  from  the  viewpoint  of  one  group,  or  soci al /pol 1 ti cal 
perspecti ve. 

C.  Coverage  among  the  reviewers  of  ley  subject  areas  such  as 
humanities  and  social  sciences. 
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The  Meeting 

The  CAT-ASVAB  Sensitivity  Review  was  conducted  over  a  two-day 
period.  The  meeting  began  with  a  brief  orientation  session  to 
familiarize  panel  members  with  the  purpose,  plan,  and  procedures  for 
the  sensitivity  review.  Each  panel  member  had  also  received  a  boo!  let 
containing  copies  of  relevant  sections  from  cTS  Jest  Sensitivity 
Review  Process  (Hunter  h  Slaughter,  198u)  to  review  prior  to  the 
meeting.  The  ETS  test  evaluation  criteria,  guide! ines  for  recognition 
of  unacceptable  sterotypes,  caution  words  and  phrases,  and  special 
review  criteria  for  women's  concerns  were  detailed  in  this  boo!. let  and 
were  to  be  referred  to  by  panel  members  as  guidelines  for  item  review 
and  reporting  of  results. 

Following  the  tas!  orientation  session,  panel  members  were 
assigned  to  review  CAT  subtests  in  review  teams.  Each  review  team 
consisted  of  one  male  and  one  female  of  different  ethnic  affiliations, 
to  ensure  that  the  sensitivity  review  process  was  balanced  across  sex 
and  ethnicity  as  much  as  possible.  The  CAT-ASVAB  item  ban!,  consists 
of  21  IB  test  items,  representing  nine  different  content  areas: 
Electronics  Information,  Mechanical  Lomprehensi on ,  Shop  Information, 
Automotive  Information,  Mathematics  !nowledge.  Arithmetic  Reasoning, 
Paragraph  Comprehension,  Word  fnowledge,  ana  General  Science.  The 
21  IB  times  were  divided  up  approx i matel y  equally  among  the  reviewers, 
with  each  panel  member  reviewing  between  ZZ 4-263  items.  Panel  members 
were  assigned  to  subtests  in  pairs  so  that  each  item  would  be  reviewed 
by  two  people. 
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D  e  s  c  r  - 1  pt  1  on  of  i§Q§i_t  i_v 1 1 y  hgyLgw  Outcome 


The  following  table  was  used  to  summarize  the  revi ewers ' cone  1 usi ones 


Subtest 


ACCEPTABLE 


UNACCEPTABLE 


W  l_t  h 

As  I§  6§visi_on 


Potent  i_al_l_y  Item 

Qit§Q??.y§  Construction 

Ff  aw 


Automot 1 ve 
1 m or  mat l on 

Nathemat i cs 
1 nowl edge 

Shop 

1 nf  ormat l on 

Mechanical 

Comprehension 

Electronics 
I  nt  or mat l on 


An  thmeti  c 
Reason l ng 

Gener al 
Sc i ence 

Word 

1  nowl edge 

Faraoraph 
Compr ehensi ve 


Total  s 

nmong  the  "acceptable"  items  some  were  judged  to  be  acceptable 
only  after  modification.  "Unacceptable"  items  were  found  to  be  so 
for  two  quite  different  reasons.  Sometimes  this  was  because  of 
their  potential  offensiveness  to  some  sub-groups  of  examinees.  A 
second  r eason  for  finding  an  item  unacceptable  was  because  of  a 
formal  flaw  in  the  item  construction.  This  usually  yielded  the 
situation  in  which  the  number  of  correct  responses  to  an  item  was 
unequal  to  one.  Although  some  of  the  screened  items  had  such  flaws 
we  leit  their  description  to  another  account;  that  aspect  of  the 
item  pool  is  outside  of  the  purview  of  this  project.  Therefore, 
"unacceptable"  items  in  the  CAT-ASVAB  item  bard  ,  refer  to  items 
which  were  found  to  be  unacceptable  from  a  sensitivity  review 
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perspecti ve, 


Due  to  the  con-f  1  dent  1  al  nature  ot  this  project,  the  results  o-f 
the  sensitivity  review  as  reported  by  panel  members  must  be  treated 
in  a  secure  manner.  Therefore,  specific  details  on  the  test  items 
and  the  sensitivity  reviewers'  comments  and  r ecommendat 1 ons  cannot 
be  presented  in  this  paper,  but  can  be  -found  in  the  Cont  1  dent  i  al 
Appendix.  to  the  Wilson  b  Wai •  ?r  (1985)  report. 


However,  the  panel  members  did  mate  some  general  comments  and 
suggestions  concerning  the  -further  improvement  of  the  sensitivity 
review  process  itself. 

In  line  with  these  and  other  sugoestions  we  recommend  in 
subsequent  reviews  the  following  addi ti ons/chanqes: 

1)  extend  the  review  process  sufficiently  to  allow  a  more 
extensive  training  period.  This  is  important  initially  to 
fully  inform  the  panel  about  the  special  aspects  of  CATs, 

2)  a  slight  modification  of  the  review  fo-m  to  more  closely 
correspond  with  the  summary  shown  in  the  Table  above. 

3)  have  available  to  the  review  panel  an  information  sheet  on 
each  item  that  contains: 

a)  the  item  key, 

b)  the  content  specifications  that  each  item  fulfills, 
a  the  statistical  results  of  the  item's  pretesting, 
d)  comments  of  previous  reviews  (if  any). 

Tne  availability  of  item  history  m  a  form  as  described  in  3  above 
is  prc>  forina  in  traditional  test  development.  Although  our 
recommendations  deal  with  improving  the  ease  and  quality  of  the  item 
review  process,  it  has  been  the  experience  of  test  developers  that 
such  a  procedure  is  quite  useful  throughout  the  item  development 
process . 
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calibration  of  an  Item  pool  with  the  necessary  properties  is 
possible,  Dut  beyond  the  scope  of  this  paper. 

Under  such  a  CAT  system,  strong  parallels  exist  between 
the  classical  theory  and  the  IRT  CAT  system,  as  summarized 
in  Table  1 . 


TABLE  1 


Classical  Test  Theory 
"True  score"  =  T;  var  c T ) 
"Error"  =  E;  var  (.  E  ) 


IRT.  CAT  (fixed  se^  stopping) 
9 ;  var ( 9 ) = 1 

error:  var ( error ) =  se2 (Q > 


"Observed  score"=X; 

X  =  T  ♦  E, 

( independent )  , 
var ( X ) =var (T ) ♦ var ( E )  . 

rho2XT  =  var (T) /var (X) 
from  < 1 ) . 


eat  (  0 )  ; 

eat  <  9 )  =  9  *  error, 

( independent) , 
var  ( est  ( 8 )  )  =  1  +  se_2  (Q  )  . 

rho2  =  1  /  C 1  ♦  se^  (8)3  (2) 

f  rom  (  1 )  . 


Using  slightly  different  notation.  Samenma  (1977)  notes 
that  rho2  m  equation  (2)  reoresents  the  reliability,  or 
expected  correlation  between  parallel  test  scores,  for  an 
IRT-acored  test.  She  notes  that  this  form  Is  impossible  or 
deceptive  if  se2 (8)  varies  as  a  function  of  8;  However,  m  a 
CAT  system,  it  is  possible  that  se2 (8)  is  a  constant  and 
equation  (2)  is  precisely  correct. 

Further  parallels  may  be  drawn  between  the  classical 
theory  and  IRT  CA_T  systems.  In  classical  test  theory,  the 
best  prediction  of  the  true  score  Is  the  so-called  Kelley 
(1947'  "regressed"  estimate,  which  is 


rest ( T )  =  rho-Ax  X  (3) 

if  X,  T,  and  E  all  have  mean  zero  and  var ( T ) - 1 .  If  one 
"bends"  I RT  slightly,  by  claiming  that  the  likelihood  over  9 
is  exactly  ( instead  of  approximately)  Gaussian  with  mean 
est  (8)  and  variance  se2(8),  then 

rest<8>  =  rho2  est ( 8 )  (4) 


is  exactly  equal  to  either  the  Bayes  modal  estimate  of  9  or 
the  expected  a  posteriori  estimate  of  9  computed  with  a 
population  distribution  which  is  N(0,1).  The  mode  and  the 
mean  become  the  same  when  "everything  is  Gaussian";  that  is 
why  we  are  using  the  generic  notation  est < 9 )  and  rest (9)  for 
the  un-regressed  and  regressed  estimates  of  8  respectively. 

Note  that  in  the  context  of  an  I RT  CAT  system  using  a 
fixed  (equal)  se2 (9  >  stopping  .ale,  the  concept  of 
reliability  is  not  "dead".  It  is,  as  a  matter  of  fact, 
enhanced:  it  is  "nghter"  than  it  ever  was  under  the 
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classical  theory,  it  is  nghter  because  it  is  based  on  the 
idea  of  equal  error  variance,  which  was  never  true  for 
classical  test  scores,  but  which  can  oe  made  true  in  CAT . 

The  value  of  rho  is  both  a  prediction  of  the  (hypothetical, 
"vashed-braln")  test-retest  correlation  between  estimates  of 
9,  and  it  is  the  "regression"  or  "shrinkage"  constant  in  the 
Bayes  estimates  of  0. 

Lord  and  Novick  (1968)  distinguish  among  three  different 
"errors"  in  their  discussion  of  the  classical  theory  of 
reliability  and  those  distinctions  are  useful  here  as  well. 
The  three  error  vari-nces  discussed  by  Lord  and  Novick,  and 
their  I RT  counterparts  in  equal-se2  CAT  systems  are 
tabulated  In  Table  2. 


TABLE  2 


Variance  of  Measurement 


Classical  Test  Theorv 


var ( E )  =  var(X) Cl-rhoyy'3 


se2 ( q  > 

=  var Cest (8 ) 3  [  1 -rho23 


Variance  of  Estimation 


Classical  Test  Theory 


var (T) Ci-rhoxx'] 


Variance  of  Prediction 


Classical  Test  Theory 


1 / C 1 ♦ 1 / se  2  <0)3 
=  vgr C Q 3  [  1 -rho23 


var ( X ) Cl-rho2xx'] 


1 / C 1 ♦ 1 / se 2 (0)3  *  se2 ( 0 ) 

=  var test (0) 3  C 1 - ( rho2 ) 23 


The  discussion  m  Lord  and  Novick  (1968,  p.  67-6)  is  cast  m 
terra  of  standard  errors,  instead  of  the  variances  given 
above;  the  standard  errors  are  the  square  roots  of  these 
variances.  The  standard  error  of  estimation  is  the  square 
root  of  se2;  the  standard  deviation  of  the  Bayesian 
posterior  over  9  is  the  square  root  of  the  variance  of 
estimation,  and  the  standard  error  of  prediction  of  a 
subsequent  estimate  of  0  from  the  regressed  estimate  is  the 
square  root  of  the  variance  of  prediction.  Those  three 
values  are  not  generally  the  same,  which  is  one  of  the 
things  wrong  with  defining  reliability  as  "the  degree  to 
which  a  test  is  free  from  error":  which  error?.  But  they  are 
all  functions  of  rho2. 

It  is  also  possible  and  reasonable  in  this  context  to 
consider  the  reliability  of  composite  scores  C  obtained  as 


'A*  %  ;*  A  A  A  A 


V  V\-  V>  vv  v 


linear  combinations  of  estimates  of  two  or  more  0s .  The 
simplest  example  is  a  composite  of  two  tests.  If 


est (C )  =  est ( Q i )  ♦  ast ( Qo ) 

=  <-8i_  *  errori  )  ♦  <02  *  errorj)  . 

and  the  Gs  are  correlated  r  with  each  other  and  the  errors 
are  uncorrelated  with  everything,  then 

var  <  est  (.O')  =  var  <  0  \  )  *  var  ( 0  2  > 

*  var'-Ei)  +  v  a  r  <  E  ? )  ♦  2  r  sartCvarOi  )var(9?)3 


and 


var  (C)  =  varOi  >  *  var  i  9? )  *  2r  sqrt  C  var  <  8 1  )  var ( 8? ) 3  , 


so 


rhocc'  =  var(C)/var(est(C)) 

as  per  equation  (1).  Generalization  to  composites  of  more 
than  two  scales  is  obvious. 

Other  metrics 

If  the  test  scores  are  to  be  reported  m  some  metric 
other  than  the  Q-metric,  such  as  "expected  raw  score" 
(EtscoreJ ) ,  and  composites  are  to  be  computed  as  linear 
combinations  of  scores  in  that  metric,  then  it  is  obligatory 
that  reliability  be  reported  in  the  transformed  metric.  The 
simplicity  described  above  can  still  be  obtained  in  a  CAT 
system  if  the  stopping  rule  is  based  on  equal  se2s  in  the 
Etscorel  metric.  This  is  somewhat  deceptive  psychologically: 
equal  se2F  rsrnrpl  represents  very  unequal  se2s  in  the  9 
metric,  as  E [score] s  at  the  extremes  have  very  small 
variances  when  the  associated  estimates  on  the  ability 
dimension  still  have  very  large  variances.  But  rho2  would 
have  its  correct  meaning  in  the  metric:  in  which  the  test  is 
being  described. 

Unequal  se^  stopping  rule3 

GAT  systems  may  be  implemented  with  stopping  rules 
other  and  those  which  give  constant  se2s .  Such  stopping 
rules  clearly  produce  unequal  s<e2s  for  different  values  of 
est (9) ,  What  happens  to  the  concept,  and  the  computation,  or 
rho2  under  such  alternative  stopping  rules? 

There  are  several  problems.  The  most  obvious  is  that 
rho2  cannot  be  computed  as  in  equation  <2>,  since  that 
requires  a  constant  value  for  se2 (8> ,  which  does  not  exist. 
Same^ima  (1877)  suggests  that  for  some  purposes  equation  <2> 
may  be  replaced  by 
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r  h  o  2  =  1/il-averaqel se/  ( 9  > ) 3  ( 5 ) 

in  which  the  error  variances  are  averaged  over  the 
distribution  of  9  for  the  group  being  tested.  Several 
further  problems  immediately  arise.  One  is  that  the  estimate 
of  "rel labi 1 lty "  obtained  with  equation  (5)  depends  on  the 
distribution  of  9  used  in  the  "average”;  if  it  is  the 
theoretical  population  distribution,  it  depends  on  the 
theory,  and  if  it  is  an  empirical  distribution,  it  depends 
on  the  sample. 

A  second  problem  is  that  it  is  not  clear  what  purpose 
rho2  computed  using  equation  (5)  could  serve.  It  remains 
true  that  it  is  an  estimate  of  the  correlation  that  would  be 
obtained  between  parallel  tests.  However,  it  is  not  an 
indicator  of  the  size  of  the  error  var.ance  for  any 
particular  est ( 9  > .  Here,  all  est ( 9 ) s  have  different  se^e  and 
rho2  computed  from  (5)  reflects  only  their  average.  An 
average  of  a  set  of  numbers  which  are  known  to  vary 
systematically  is  not  particularly  useful.  It  would  be  much 
more  useful,  if  the  goal  was  to  characterize  the  size  of  the 
errors  of  measurement,  to  abandon  the  concept  of  reliability 
altogether  and  report  the  sizes  of  the  se2s  as  a  function  of 
9,  in  either  graphical  or  tabular  form. 

We  noted  above  that,  in  the  equal-se2  situation,  the 
value  of  either  se 2  or  rhc~  is  indicative  of  the  amount  of 
"shrinkage"  induced  by  the  population  distribution  m 
regressed  estimates  of  9.  Unfortunately,  rho2  from  ( 5 )  is 
not  informative  about  shrinkage.  Indeed,  there  is  rather 
serious  theoretical  problem  underlying  this  loss-of- 
usefulness  of  the  concept  of  reliability.  When  the  se2s  of 
est  C9)s  are  unequal,  each  is  regressed  a  different  amount 
(proportional  to  its  own  variance)  m  Bayesian  estimation 
schemes.  That  means  that  the  population  distribution  has 
differential  effects  on  different  individuals,  which  may  be 
unfair  . 

Consideration  of  the  reliability  of  composite  scores  is 
complicated  extraordinarily  by  unequal  se2s  at  the 
individual  score  level.  Even  in  the  simplest  case  in  which  a 
composite  is  the  sum  of  two  component  scores,  if  those  two 
component  scores  may  have  different  se2s  associated  with 
different  values,  then  o  single  value  of  the  composite  could 
have  a  wide  variety  of  se2s .  This  is  true  because  a  single 
value  of  the  composite  could  be  produced  by  different 
combinations  of  9i  and  Q2,  in  which  each  9  could  have 
different  se>2s  which  are  combined  to  give  the  se2  for  that 
particular  way  of  obtaining  that  score  on  the  composite. 

While  it  is  true  that  the  distribution  of  se2s  possible 
for  each  value  of  the  composite  could  be  averaged  to  give  a 
single  average ( se2 )  for  that  value  cf  the  composite  score, 
that  could  be  a  fairly  deceptive  value  when  applied  to  any 
particular  instance  of  that  composite  score.  And  each  value 
of  the  composite  score  would  still  have  a  different  se 2 ; 
those  would  have  to  be  averaged  again  to  produce  a  single 
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"reliability  coefficient"  for  the  composite  score  as  a 
whole.  It  is  possible  to  do  this  numerically,  or  actually 
compute  empirical  test-retest  reliability  coefficients;  but 
what  is  the  use  of  such  mu  1 1 1 pi y - aver aged  numbers  when  we 
know  that  the  average  value  applies  to  no  particular  teet 
score  ? 

Conclusion 

There  is  a  great  deal  to  be  said  for  the  concept  of 
reliability:  it  predicts  test-retest  correlation,  describes 
the  error  variance  of  test  scores,  and  specifies  the  emount 
of  “shrinkage"  inherent  in  Bayes  estimators.  There  is  even 
more  to  be  said  for  the  construction  of  tests  which 
unambiguously  have  some  specific  reliability.  It  appears 
that  such  teats  must  be  administered  by  CAT  systems  with  a 
constan*  se.2  stopping  rule. 
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Introduction  and  Background 

This  paper  is  based  on  data  collected  for  the  large  Army  personnel  re¬ 
search  project  titled  "Improving  the  Selection,  Classification,  and  Utilization 
of  Army  Enlisted  Personnel:  Project  A"  (Eaton,  Hanser,  &  Shields,  1 985 ) * 

This  project  was  conceptualized  and  planned  during  the  i960  to  1981  time  pe¬ 
riod,  and  a  contract  was  signed  with  the  prime  contractor,  Human  Resources 
Research  Organization  (HumRRO),  in  1982.  It  is  being  conducted  jointly  by 
scientists  from  the  U.S.  Army  Research  Institute  for  the  Behavioral  and  Social 
Sciences  (ARl),  HumRRO,  the  American  Institutes  for  Research  ( AI R ) ,  and  Person¬ 
nel  Decisions  Research  Institute  (PDRl). 

Early  m  the  planning  for  Project  A,  it  was  recognized  that  a  large  pro¬ 
portion  of  the  research  would  have  to  be  devoted  to  criterion  development. 

Plans  called  for  the  development  of  several  different  measures  of  performance: 
t, a )  tests  of  hands-on  performance,  (b)  paper  and  pencil  teats  of  job  knowledge, 
and  (c)  ratings  of  typical  performance.  Each  of  these  broad  categories  of 
criteria  were  further  subdivided.  Hands-on  tests  included  tasks  which  were 
specific  to  each  Military  Occupational  Specialty  (MOS)  as  well  as  tasks  common 
to  all  MOS.  Two  kinds  of  paper  and  pencil  tests  were  constructed:  (a)  to 
emphasize  the  content  of  formal  school  training,  and  (b)  to  emphasize  KOS-spe- 
cific  task  performance.  Rating  forms  were  constructed  both  for  MOS-specific 
tasK  performance  as  well  as  for  non  MOS-specific  Army-wide  performance  that  we 
have  labelled  broadly  as  "soldiering." 

The  initial  impetus  for  developing  such  a  comprehensive  set  of  criterion 
measures  was  largely  a  function  of  our  underlying  theory  of  performance  meas¬ 
urement.  This  underlying  theory  states  rather  simply  that  performance  in  a  job 
is  multi-dimensional ,  and  that  it  is  not  possible  to  capture  that 
multi-dimensionality  using  only  one  measurement  method.  A  method  of  measure¬ 
ment  may  be  intrinsic  to  some  tasks.  For  example,  having  the  requisite  knowl¬ 
edge  of  how  to  take  a  person's  blood  pressure  may  not  be  the  same  as  actually 
being  able  to  perform  the  task  accurately,  yet  both  are  important.  An  individ¬ 
ual  may  score  high  on  a  paper  and  pencil  test  on  this  task,  but  might  not  score 
as  high  on  a  hands-on  test  of  this  task.  In  order  to  be  successful  in  perform¬ 
ing  this  task  on  the  job  it  requires:  (a)  the  knowledge  of  how  to  do  the  task, 
(b)  the  physical  skills  to  perform  the  task,  and  (c)  the  motivation  to  do  it. 

Or  to  put  it  in  another  well  known  way:  performance  *  f(ability  x  motivation). 


^The  views  expressed  m  this  paper  are  those  of  the  author  and  do  not  necessar¬ 
ily  reflect  the  view  of  the  U.S.  Army  Research  Institute  or  the  Department  of 
the  Army 


;v.:i!.se  Cf  tne  complexity  of  the  criterion  space  being  measured  in  this 
prefect  it  13  extremely  important  that  it  be  fully  understood  prior  to  choos¬ 
ing  a  f^nsl  set  of  predictors  and  recommending  changes  to  the  Army's  selection 
ar.d  classification  procedures.  Several  recent  papers  by  project  scientists 
r.ave  begun  to  address  the  issues  associated  with  criterion  development  (c.f., 
roman,' White ,  last,  a  Pulakos,  1985;  Campbell  &  Harris,  1985;  Rurasey,  Osborn, 
i  Ford,  1965'.  Berman  et  al.  constructed  and  tested  a  path  model  of  supervi¬ 
sory  and  peer  ratings  to  examine  how  each  are  related  to  other  measures  of 
ierfcrnar.ce.  Tr.ey  found  that  both  job  knowledge  and  hands-on  task  proficiency 
are  related  to  ratings,  with  the  dominant  path  between  ratings  and  hands-on 
proficiency.  They  conclude,  however,  that  "...  for  the  most  part  different 
methods  of  measuring  job  performance  yield  quite  different  results.”  Campbell 
a:.c  Harris  describe  the  results  of  attempting  to  interpret  criteria  using  a 
group  of  "concerned  psychologists."  They  also  present  a  "working  model  of  job 
performance  for  the  domain  of  skilled  jobs.”  In  examining  the  correlation 
matrices  of  hands-cn  and  job  knowledge  tests  and  rating  scales,  they  state  "... 
the  methods  correlate  more  highly  within  themselves  than  they  do  across  meas¬ 
ures."  Rumsey  et  al.  examine  the  relationships  between  job  knowledge  tests  and 
hands-on  tests  of  job  proficiency.  In  each  of  these  papers,  a  central  theme  is 
the  multi-dimensionality  of  performance  and  the  importance  of  using  different 
measurement  methods  to  capture  performance  adequately. 


The  intent  of  this  paper  is  to  further  explore  the  criterion  space  meas¬ 
ured  in  Project  A.  Previous  research  has  focused  on  aggregate  measures  of 
performance  such  as  total  scores  on  hands-on  or  paper  and  pencil  Usts  or  aver¬ 
age  ratings  across  several  dimensions.  In  this  paper  we  focus  on  task  level 
measures  m  order  to  begin  to  understand  better  the  relationships  between  kinds 
of  tasks  and  methods  of  measuring  performance  o.i  them.  Through  this  we  hope  to 
gain  a  Detter  understanding  of  the  method  variance  associated  with  measures  of 
task  performance. 


Method 


Subjects 

Data  reported  in  this  paper  were  collected  in  1984  as  part  of  field  tests 
of  the  criterion  measures  developed  by  Project  A  scientists.  Participants  in¬ 
cluded  first  tour  soldiers  in  two  Army  MOS:  (a)  178  Infantrymen  (MOS  11B)  and 
(b)  1 67  Medical  Specialists  (MOS  91A).  A  complete  description  of  the  data 
collection  methods  can  be  found  in  Campbell  and  Harris  ( 1 985 ) - 


Variables 


Percent  correct  steps  per  task  and  average  supervisory  rating  per  task 
provided  the  major  variables  used  in  these  analyses.  Percent  correct  scores 
were  obtained  on  both  hands-on  and  written  tests.  For  each  MOS  reported  here, 
approximately  15  tasks  were  scored  using  all  three  measurement  methods:  (a) 
hands-on  performance,  (b)  multiple  choice  paper  and  pencil  test,  and  (c)  aver¬ 
age  supervisory  rating  of  task  performance.  Approximately  15  additional  tasks 
per  MOS  were  tested  m  the  paper  and  pencil  test,  and  these  were  also  included 
in  the  analyses.  In  addition,  total  score  or.  a  paper  and  pencil  test  focusing 
on  training  course  content,  average  supervisory  rating  on  overall  performance, 
and  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  subtest  standard  scores 
were  included.  This  resulted  in  a  total  of  approximately  71  variables  per  MOS 
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to  be  included  ir.  these  analyses.  Although  these  are  a  relatively  small  number 
of  subjects  given  the  number  of  variables,  the  limits  of  analysis  are  a  func¬ 
tion  cf  the  number  of  factors  extracted.  These  sample  sizes  will  support  the 
extraction  cf  a  maximum  of  five  to  seven  factors  per  MOS. 

Ana  lyses 

Though  some  "feel  anxious  m  the  presence  of  too  many  partial  or 
semi-partial  correlations”  (Campbell  &  Harris,  1985),  we  decided  to  explore 
these  data  using  factor  analysis.  Our  specific  plans  were  as  follows:  (a) 
extract  a  set  of  oblique  factors  for  each  MOS,  (b)  examine  the  inter-factor 
correlation  matrices,  and  (c;  examine  the  patterns  of  loadings  within  and 
across  MCS.  We  used  a  principal  axis  solution  with  an  iterative  solution  for 
the  ccmmunalities  and  a  Promax  rotation.  We  decided  on  the  number  of  factors 
to  extract  based  or,  an  inspection  of  the  scree  and  interpretability  of  various 
solutions.  In  order  to  conserve  space,  descriptive  statistics  and 
reliabilities  are  not  reported  here.  They  are,  however,  available  elsewhere 
(Borman  et  ai.,  1985;  Campbell  &  Harris,  1985;  Rumsey  et  al.,  1985). 

Results  and  Discussion 


The  data  on  the  Medical  Specialists  yielded  a  five  factor  solution.  Table 
1  snows  the  oblique  solution.  Variables  reported  in  the  table  are  limited  to 
tne  three  highest  loading  on  any  factor,  any  variable  with  an  absolute  loading 
of  greater  than  .50  on  a  cross-method  factor,  and  any  variable  with  loadings 
greater  than  .50  on  two  or  more  factors. 

Table  1.  Rotated  Factor  Pattern  (STD  REG  COEFS) 

I  II  III  IV  V 

80  ...  Rating:Splint  Suspected  Fracture  <Supv> 

77  ...  Rating:Put  on  Field/Pres  Dressing  <Supv> 

75  ...  Rating:Perform  CPR  <Supv> 

58  .  .  -55  •  Rat: :.g:Measure/Record  Respir.  <Supv> 

55  .  50  .  Rating:Measure/Record  Pulse  <Supv> 

.  57  P&P:D9-Replace  Filters  in  Ml 7  Mask 

.  51  •  •  •  P&P: 14-Measure/Record  Respirations 

.  47  P&P: I9-Estab/Maintain  a  sterile  fid 

45  HO:  A4-Put  on  Field/Pres  Dressing 

.  54  HO:  ,,9-Init  a  Field  Med  Card 

.  .  68  .  .  ASVAB  SUBTEST  SCR-Arithmetic  Reasoning 

.  57  .  .  ASVAB  SUBTEST  SCR-Math  Knowledge 

.  .  52  ASVAB  SUBTEST  SCR-Ccding  Speed 

49  •  •  P&P:  I6-Assemble  Needle  &  Syringe 

.  .  49  •  •  P&P:  K2-Draft/Fire  TPR  Charts 

.  .  42  .  .  P&P:  A6-0pen  Airway 

.  .  40  .  .  P&P:  17-Change  a  Sterile  Dressing 

.  .  41  52  •  School:  All  Items 

.  76  .  ASVAB  SUBTEST  SCR-Au to/Shop 

.  71  .  ASVAB  SUBTEST  SCR-Electronics  Information 

59  •  ASVAB  SUBTEST  SCR-Mechanical  Comprehension 

.  .  .  57  .  P&P:  G5-Vehicle  Recognition 

68  HO:  15-Measure/Record  Pulse 

51  HO:  I 9-Est/Maintai n  Sterile  Field 


47  HO:  14-Measure/Record  Respir. 

33  35  HO:  AB-Splint  Suspected  Fracture 

As  expected,  there  are  strong  method  factors,  with  little  overlap  of 
variables  across  method  factors.  Note,  however,  that  two  ratings  overlap  with 
the  ASVAB  factors,  and  one  of  the  hands-on  tasks  overlaps  with  an  ASVAB  factor. 
Two  hands-on  tasks  have  loadings  greater  than  .30  on  Factor  II,  the  paper  and 
pencil  ]ob  knowledge  test  factor.  Several  of  the  job  knowledge  test  tasks  load 
on  the  two  ASVAB  factors.  Also,  ASVAB  splits  into  two  factors,  a  math/speed 
factor  and  a  technical  factor.  Table  2  provides  the  factor  correlations. 

Table  2.  Inter-Factor  correlations 


I 

II 

III 

IV 

V 

T 

100 

1 

7 

-11 

17 

II 

1 

100 

15 

27 

-2 

III 

7 

15 

100 

-6 

19 

IV 

-1 1 

27 

-6 

100 

-8 

V 

17 

-2 

19 

-8 

100 

Not  surprisingly,  the  paper  ana  pencil  job  knowledge  test  factor,  Factor 
II,  and  an  ASVAB  factor,  Factor  IV,  have  the  highest  correlation.  Note,  how¬ 
ever,  that  none  of  the  ASVAB  subtests  have  loadings  of  .30  or  higher  on  Factor 
II,  and  that  it  is  the  ASVAB  technical  factor  which  correlates  highest  with  the 
job  knowledge  paper  and  pencil  test  factor.  The  ASVAB  Verbal  subtest  did  not 
meet  the  criteria  for  inclusion  in  this  table.  These  results  would  seem  to 
indicate  that  correlations  between  ASVAB  and  paper  and  pencil  job  knowledge 
measures  are  not  simply  the  result  of  shared  method  variance. 

The  next  highest  inter-factor  correlations  are  between  the  hands-on  fac¬ 
tor,  Factor  V,  and  the  ASVAB  math/speed  and  rating  factors,  Factors  III  and  I 
respectively.  While  the  hands-on  factor  is  a  relatively  pure  method  factor, 
its  correlations  with  the  other  factors  strengthen  the  conclusions  of  Borman  et 
al.  Each  method  appears  to  measure  a  different  but  related  piece  of  job  per¬ 
formance  . 

Table  3  contains  the  oblique  promax  factor  pattern  for  Infantrymen.  Seven 
factors  were  extracted.  The  choice  of  variables  to  report  was  based  on  the 
same  rules  as  for  the  previous  table  of  loadings. 

Table  3*  Rotated  Factor  Pattern  (STD  REG  COEFS) 


I 

II 

III 

IV 

V 

VI 

VII 

64 

• 

• 

P&P:  E5-0per  as  Station  in  Radio  Net 

64 

• 

v 

School:  All  Items 

61 

• 

• 

P&P:  B4-Perform  OP  Maint.  on  Ml 6A 1 

59 

P&P:  HI -Perform  Tracked  Vehicle  Maint 

56 

• 

-59 

P&P:  El-Collect/Report  Info 

66 

• 

Rating:  Install/Fire/Recover  M18A1  <Supv> 

65 

• 

Rating:  Load/Clear  M60  <Supv> 

59 

• 

Rating:  Prepare  Range  Card  for  M60  <Supv> 

54 

37 

Rating:  Mean  non  MOS-Specific<Supv> 

50 

35 

Rating:  Navigate  on  Ground  <Supv> 

• 

38 

39 

Rating:  Set  Headspace/Timing  on  .50  <Supv 

301 


51 


• 

P&P:  G8-Estimate  Range 

76 

ASVAB  SUBTEST  SCR-Auto/Shop 

74 

ASVAB  SUBTEST  SCR-Mechanical  Comprehension 

73 

ASVAB  SUBTEST  SCR-General  Science 

73 

ASVAB  SUBTEST  SCR-Verbal 

79 

Rating:  Op  as  Station  in  Radio  Net  <Supv> 

76 

Rating:  Op  Radio  Set  AN/PRC-77  <Supv> 

44 

HO:  £5-0p  as  Station  in  Radio  Net 

31 

HO:  BC-Engage  Targets  w  LAW 

68 

HO:  C6-Call/Adjust  Indirect  Fire 

67 

HO:  G8-Estimate  Range 

55 

HO:  B4-Perform  Op  Maint  on  M16A1 

37 

Rating:  Call/Adjust  Indirect  Fire  <Supv> 

32 

PdP:  B9-Engage  w  Hand  Grenades 

HO:  BB-Prepare  Range  Card  for  M60 

58 

HO:  J1 -Movement  in  Urban  Terrain 

56 

HO:  BA-Prepare  Dragon  for  Firing 

36 

50 

HO:  B9-Engage  Targets  w  Grenades 

47 

HO:  11 -Install/Fire/Recover  M18A1 

35 

P5P:  BA-Prepare  Dragon  for  Firing 

• 

71 

ASVAB  SUBTEST  SCR-Numerical  Operations 

• 

59 

ASVAB  SUBTEST  SCR-Coding  Speed 

40 

• 

54 

ASVAB  SUBTEST  SCR-Math  Knowledge 

41 

• 

53 

ASVAB  SUBTEST  SCR -Arithmetic  Reasoning 

While  similar  method  factors  emerge,  the  factor  apace  for  infantrymen  is 
slightly  more  complex.  The  ASVAB  factors  III  and  VII  are  quite  clean,  though 
Factor  III  and  the  paper  and  pencil  job  knowledge  test  Factor  I  are  relatively 
oblique  (Table  4.).  These  factors  are  substantially  more  correlated  than  are 
the  two  ASVAB  factors  with  each  other.  Note  also  that  the  ASVAB  math/speed 
Factor  VII  has  a  lower  correlation  with  the  paper  and  pencil  job  knowledge  test 
Factor  I,  than  the  more  technical  ASVAB  Factor  III.  If  there  is  a  simple 
"written  test"  factor,  it  failed  to  emerge  in  either  of  these  solutions. 


Perhaps  most  interesting  are  Factors  IV  and  V.  Each  of  these  factors  i.as  a 
mixture  of  variable  loadings  representing  different  measurement  methods.  On 
Factor  IV  the  supervisory  rating  and  hands-on  test  for  operating  as  a  radio 
station  in  a  net  both  load  substantially.  On  Factor  V  the  supervisory  rating 
and  hands-on  test  for  call/adjust  indirect  fire  both  load  substantially,  and 
the  paper  and  pencil  and  hands-on  tests  for  engage  targets  with  grenades  also 
both  load  substantially. 


Table  4  gives  the  correlations  among  the  factors  for  Infantrymen.  This 
solution  is  considerably  more  oblique  than  the  solution  for  Medical  Special¬ 
ists. 


Table  4.  Inter-Factor  correlations 


I 

11 

III 

IV 

V 

VI 

VII 

I 

100 

30 

53 

36 

18 

25 

24 

II 

30 

100 

13 

40 

18 

19 

-3 

III 

53 

13 

100 

6 

13 

-1 

29 

IV 

36 

40 

6 

100 

21 

34 

-5 

V 

18 

18 

13 

21 

100 

-2 

1 
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The  highest  correlation  is  between  Factor  I,  the  paper  and  pencil  job  knowledge 
test  factor,  and  Factor  III,  the  ASVA3  technical  factor.  This  result  is  simi¬ 
lar  to  that  noted  previously.  Tne  two  primarily  supervisory  rating  factors,  II 
ana  IV,  are  quite  highly  correlated  with  the  paper  and  pencil  te3t  of  job 
knowledge  factor.  In  fact,  Factor  IV  correlates  almost  as  highly  with  Factor  I 
(r=.36)  as  it  does  with  the  other  rating  factor,  Factor  II  (r=.40).  The  two 
hands-on  test  factors,  V  and  VI,  are  uncorrelated  with  each  other.  Factor  VI 
has  respectable  correlations  with  both  the  paper  and  pencil  job  knowledge  test 
factor,  Factor  I,  and  the  rating  factor,  Factor  IV. 

Conclusions 


Our  tendency  as  psychologists  is  to  abhor  method  variance  as  something  to 
be  avoided.  This  should  not  necessarily  be  the  case  in  the  realm  of  job  per¬ 
formance  measurement.  Performance  of  a  task  requires  first  the  ability  and 
motivation  to  learn  the  task,  and  second  the  skill,  ability,  and  motivation  to 
perform  it.  Different  methods  of  measuring  performance,  hands-on  tests,  writ¬ 
ten  tests,  and  ratings,  capture  slightly  different  aspects  of  performance. 

Some  of  these  relationships  are  apparent  from  the  data  presented  above. 

What  remains  for  us  is  to  understand  which  kinds  of  tasks  are  most  appro¬ 
priately  measured  by  which  methods.  The  research  reported  here,  while  open  to 
several  interpretations,  presents  a  method  and  several  examples  of  a  way  to  do 
this.  Clearly,  more  research  needs  to  be  conducted  into  the  content  of  the 
tasks  themselves  and  their  relationships  to  method  factors  across  several  more 
occupations  than  are  included  here. 
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WALK-THROUGH  PERFORMANCE  TEST  DEVELOPMENT:  LESSONS  LEARNED 


Carl  J.  Taylor 
Jack  L.  Blackhurst 
Rodger  D.  Ballentine 

Air  Force  Human  Resources  Laboratory 
Brooks  Air  Force  Base,  Texas 

The  Air  Force  Human  Resources  Laboratory  (AFHRL)  is  involved  in  a 
multi-year  effort  investigating  the  feasibility  of  measuring  and  linking  job 
performance  to  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  scores.  The 
major  focus  of  this  work  is  the  development  of  a  technology  for  systematically 
obtaining  job  performance  data  as  criteria  for  validating  enlisted,  officer, 
and  civilian  selection  systems,  and  evaluating  Air  Force  training  programs. 

Planning  for  the  Air  Force's  program  of  research  in  performance  assessment 
began  several  years  ago  as  the  result  of  three  primary  requirements. 
Operational  military  and  civilian  program  managers  in  the  manpower,  personnel, 
and  training  communities  asked  AFHRL  to  develop  an  approach  for  measuring  job 
performance  so  that  the  measures  could  be  used  to  assist  in  the  evaluation  of 
t’neii  training  and  selection  program'.  Secondly,  the  manpower,  personnel,  and 
training  research  community  needed  performance  measures  to  serve  as  criteria 
m  their  research  projects.  Plans  for  the  Air  Force  performance  measurement 
effort  to  meet  these  requirements  were  already  under  development  when  a  third 
requirement  for  these  measures  came  with  the  congressional  mandate  to  test  the 
feasibility  of  validating  the  ASVAB  against  job  performance  measures. 

The  cornerstone  of  this  criterion  development  effort  is  a  work  sample 
testing  approach  known  as  Walk-Through  Performance  Testing  (WTPT).  The  WTPT 
process  is  being  developed  to  expand  the  range  of  job  tasks  measured  to 
include  tasks  which  do  not  lend  themselves  to  hands-on  testing  because  of 
cost,  time,  and/or  safety  considerations.  WTPT  combines  hands-on  task 
performance  and  interview  procedures  to  provide  a  high  fidelity  measure  of 
individual  technical  job  competence. 

The  walk-through  procedure  involves  taking  the  job  incumbent  to  l he  work 
site  and  administering  a  combination  of  performance  tasks  and  interview 
questions.  The  interview  testing  component  will  be  evaluated  both  as 
supplement  to  hands-on  data  collection  to  ensure  adequate  sampling  of  the 
domain  of  tasks  in  a  job  and  as  a  more  cost-effective  substitute  for  hands-on 
testing.  A  wide  range  of  alternative  job  performance  measures  will  be 
developed  in  addition  to  the  walk-through  testing  methodology.  These  include 
peer,  supervisor,  and  self-performance  ratings,  at  the  task,  dimension,  and 
global  leve 1 s . 

Alternative  measures  will  be  developed  for  the  same  specialties  used  to 
develop  walk-through  testing  techniques.  When  job  sample,  interview,  and 
alternative  forms  have  been  developed  .or  each  ot  the  Air  Force  specialties 
selected  for  this  study,  their  relative  ility  will  be  determined.  Existing 
performance  measures  such  as  technical  training  scores,  Airman  Performance 
Report  ratings,  skill-level  advancement  indices,  and  Specialty  Knowledge  Test 
scores  will  also  be  evaluated  as  possible  alternative  measures. 
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Over  the  past  two  years  AFHRL  has  developed  WTPTs  for  four  Air  Force 
Specialties  (AFS'1.  The  Jet  Engine  Mechanic  career  field  was  selected  as  the 
AFS  for  WTPT  prototype  development.  As  the  jet  engine  mechanic  WTPT  was 
developed,  the  process  was  cocumented  (.Alba  and  Wilcox,  1985)  to  serve  as  a 
set  of  procedural  guidelines.  These  guidelines  were  then  applied  to  three 
additional  career  fields:  Air  Traffic  Control  Operator,  Avionic 
Communications  Specialist,  and  Ground  Radio  Operator.  The  desire  was  to  apply 
the  guidelines  directly  to  the  next  three  career  fields;  however,  it  was 
anticipated  that  modifications  might  be  needed  to  account  for  the 
dissimilarities  between  mechanical  and  non-mechanical  career  fields.  The 
purpose  of  this  paper  is  to  highlight  the  lessons  learned  as  the  WTPT 
Procedural  Guidelines  were  applied  to  the  other  career  fields. 

TASK  SELECTION’/ TASK  ANALYSIS 


This  step  of  WTPT  development  was  d  rectly  transferable  to  the  three 
additional  career  fields.  That  is,  computerized  occupational  analysis  data 
was  used  to  develop  a  task  selection  plan.  This  plan  specified  the  guidelines 
for  identifying  Phase  I  tasks  (tasks  performed  by  50%  or  more  of  first-term 
incumbents;  and  Phase  II  tasks  (tasks  performed  by  40%  or  more  of  first-term 
incumbents  in  a  functional  area  and  not  included  in  Phase  1). 

The  selected  tasks  were  then  reviewed/validated  by  subject  matter  experts 
(SME)  during  task  selection  workshops.  These  workshops  resulted  in  a  number 
of  tasks  being  discarded  or  moved  from  one  phase  to  another  due  to  mission 
requirement  changes,  technology  changes,  or  low  tusk  difficulty  levels.  The 
inconsistencies  between  occupational  analysis  data  and  SME  input  can  be 
attributed  to  the  age  of  the  data.  Even  though  the  occupational  survey 
reports  (OSR)  were  the  most  recent  ones  for  the  three  career  fields,  they  were 
three  to  four  years  old.  This  resulted  in  several  false  starts.  For 
instance,  based  on  OSR  data  pertaining  to  the  numbers  of  first-term  personnel 
in  subareas  of  the  ground  radio  operator  career  field,  the  initial,  focus  of 
task  selection  was  on  the  Military  Affiliated  Radio  System  (MARS)  area  and  not 
on  the  Mobile  Communications  (MOB)  area.  After  spending  some  time  with  SMEs, 
it  was  discovered  that,  since  the  last  OSR,  mission  emphasis  had  been 
increasing  for  the  MOBs  and  decreasing  for  the  MARS. 

This  type  of  problem  can  best  be  alleviated  in  the  future  by  ensuring  that 
OSR  data  is  current  on  career  fields  selected  for  WTPT  development. 

Commun> cat  ion  with  the  major  command  (MAJCOM)  career  field  functional  manager 
very  early  in  the  task  selection  plan  development  is  also  essential  so  that 
recent  or  upcoming  modifications  in  technology  or  mission  can  be  a  built  into 
the  WTPT. 


Once  tasks  had  been  selected,  a  task  selection  workshop  was  held. 
Participants  included  the  career  field  functional  manager  and  SMEs  having 
extensive  experience  in  the  career  field.  During  the  task  selection  workshop, 
emphasis  was  placed  on  ensuring  the  tasks  were  currently  performed  and  that 
thev  were  performed  in  a  similar  manner  by  everyone  in  the  applicable 
functional  area  (i.e.,  flightline,  shop,  operations,  maintenance).  If 
workshop  participants  suggest  deleting  a  task  or  moving  it  from  one  phase  to 
another,  detailed  documentation  should  be  provided  on  the  justification.  For 
each  task,  documentation  should  also  be  provided  on  equipment-related 
simi larities/dif ferences  in  how  a  task  is  performed,  reliance  on  local 
operating  procedures  and  Air  Force  or  MAJCOM  regulations,  whether 


507 


\ 


A 


V 

V 

V 

V 

V 

V 


ermers  only  perform  parts  of  or  all  of  a  task,  and  if  a  task  is 
pe .  -  -  d  in  the  same  manner  in  all  units.  Increasing  the  emphasis  in  these 
areas  during  the  workshop  will  reduce  confusion  during  the  task  analysis  stage. 

The  objective  of  our  task  analysis  is  to  gather  information  essential  to 
WTPT  item  development.  Relevant  information  includes  beginning  and  ending 
points  for  each  specific  task,  critical  steps  for  task  accomplishment, 
logistical  requirements  for  task  completion,  required  configuration  of 
equipment,  time  critical  and  safety  steps,  effects  of  local  operating 
procedures  on  task  performance,  and  representativeness  of  the  task.  This 
information  is  gathered  by  referencing  applicable  regulations,  technical 
orders,  and  local  operating  instructions  as  well  as  discussions  with  SMEs 
the  field.  The  desired  result  of  the  analysis  is  a  comprehensive  list  of 
steps  required  for  successful  task  completion.  These  steps  should  be 
general izable  to  any  situation  in  which  the  task  might  be  observed  and  should 
be  used  to  objectively  evaluate  an  individual's  performance  on  the  task. 

For  the  ground  radio  and  avionic  communications  career  fields  this  process 
worked  very  well;  however,  problems  were  encountered  in  the  air  traffic 
control  area.  One  problem  area  is  related  to  the  differences  in  the  work 
environment  for  radar  approach  control  (RAPCON)  personnel  and  tower 
personnel.  RAPCON  controllers  depend  solely  on  various  radar  scopes  to 
separate  and  sequence  aircraft.  Tower  controllers  are  concerned  with  line  of 
sight  traffic  separation  and  use  radar  very  infrequently.  The  result  is  that 
there  are  only  a  very  small  number  of  Phase  I  tasks  ti.e.,  those  accomplished 
in  both  facilities).  Those  tasks  which  are  performed  in  both  areas,  typically 
are  performed  differently  in  the  two  areas.  This  resulted  in  a  "separate  but 
equal"  approach.  Problems  encountered  in  constructing  the  . :tual  WTPT  will  be 
discussed  in  the  next  section. 

Test  Development 

The  WTPT  procedural  guidelines  also  served  as  the  basis  for  test 
development  on  all  three  additional  specialties.  The  guidelines  transferred 
easily  to  the  new  specialties;  however,  problems  were  encountered  in 
developing  standardized  tests  once  the  tasks  were  identified. 

Tne  fi  st  problem  area  is  test  security.  Because  of  the  physical 
arrangement  of  most  radio  operations  rooms,  avionic  maintenance  shops,  and  air 
traffic  control  facilities,  it  will  be  difficult  to  administer  the  WTPT 
without  allowing  others  to  observe  the  tasks  being  evaluated.  One  possible 
solution  is  to  tempoiarily  partition  off  the  test  area;  however,  this  may 
impact  the  unit's  ability  to  carry  out  their  mission.  Additionally,  some 
r  id i o  facilities  require  security  clearances  for  facility  access  due  to  their 
mission.  This  _,v  present  problems  during  data  collection  as  prospective 
contractor  evaluators  have  not  obtained  clearances.  In  an  attempt  to 
alleviate  this  problem,  the  AFHRL  recently  conducted  a  study  comparing 
performance  ratings  given  by  contractors  with  those  given  by  active  duty 
personnel;  however,  results  are  unavailable  at  this  time. 

The  air  traffic  control  field  presents  unique  problems  in  terms  of 
developing  standardized  testing  situations.  For  instance,  the  radio, 
avionics,  and  jet  engine  mecnamc  jobs  involve  heavy  reliance  on  technical 
orders,  are  labor  intensive,  and  are  characterized  by  completion  of  tasks  in  a 
routine,  standardized  manner.  Most  tasks  can  only  be  done  one  way  and  only 
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rarely  does  the  performance  of  one  task  have  any  effect  on  subsequent  tasks. 
These  characteristics  make  it  relatively  easy  to  construct  standardized 
testing  environments.  In  addition,  jet  engines  or  radio  equipment  can 
actually  be  used  for  the  test  and  the  test  administrator  can  ensure  the 
equipment  is  configured  exactly  the  same  for  each  incumbent. 

Conversely,  the  air  traffic  controller's  job  is  characterized  by  many 
correct  ways  to  deal  with  a  situation,  is  intensely  interactive,  and  often  has 
a  time  critical  component.  For  test  administration  purposes  it  cannot  be 
requested  that  aircraft  fly  repeatedly  in  preset  patterns  for  purposes  of 
presenting  standardized  scenarios  to  the  incumbents  being  evaluated. 

Another  characteristic  of  the  air  traffic  control  area  is  that  individual 
tasks  for  .‘■is  career  field  require  few  measureable  steps  for  completion. 

This  is  a  function  of  the  air  traffic  controller's  job.  For  instance,  tuning 
a  radio  or  troubleshooting  avionics  equipment  requires  a  series  of  specific 
steps  which  can  be  readily  observed  and  objectively  evaluated.  In  contrast, 
the  observable  portions  of  the  air  traffic  control  job  consists  of  keying  a 
microphone,  operating  runway  lights,  or  using  the  correct  phraseology.  These 
tasks  cannot  be  further  dissected  into  measureable  substeps. 

To  address  these  problems,  AFHRL  developed  a  "job  module  approach"  for  air 
traffic  control  operators  involved  in  RAPCON  duties-  The  job  module  makes  use 
of  RAPCON  simulators  and  combines  a  number  of  tasks  into  standardized 
scenarios.  The  incumbent  is  seated  at  the  simulator  radar  screen  and  is 
required  to  "worl "  typical  traffic  problems  of  varying  levels  of  complexity. 
Individual  tasks  are  scored  as  they  occur  in  these  scenarios.  Due  to  the 
absence  of  tower  simulators,  a  number  of  approaches  for  developing 
standardized  job  modules  for  tower  personnel  were  considered.  These 
approaches  will  be  outlined  in  the  following  section. 

Due  to  the  absence  of  tower  simulators,  AFHRL  considered  a  number  of 
approaches  to  standardized  hands-on  testing  for  tower  personnel  (e.g.,  using 
the  tower  simulator,  video  taping  scenarios  from  the  simulator,  developing  a 
new  simulator,  computer  games,  slide  presentations,  and  video  taping  live 
situations).  Each  of  these  approaches  was  evaluated  with  respect  to  test 
standardization,  objectivity,  stimulus  fidelity,  cost,  discrimination,  and 
time  to  develop.  As  a  result  of  these  comparisons,  the  use  of  video  taped 
live  traffic  is  being  pursued. 

Summary 


The  AFHRL  as  developed  a  set  of  procedural  guidelines  which  can  be  used 
to  design  valid,  relianle,  and  standardized  hands-on  performance  assessment 
instruments  for  Air  Force  enlisted  career  fields.  The  guidelines  were 
developed  around  a  mechanical  career  field  and  then  applied  to  career  fields 
representing  the  remaining  three  ASVAB  Aptitude  Index  areas  (administrative, 
electrical,  and  general).  The  major  lesson  learned  with  respect  to  applying 
the  procedural  guidelines  concerns  the  OSR  information  used  as  the  starting 
point  for  task  selection.  Every  effort  should  be  made  to  ensure  that  the  OSRs 
are  current  for  future  AFSs .  The  other  lessons  learned  involve  the  ability  to 
develop  standardized  testing  environments  and  test  administration  logistics 
issues.  Being  aware  of  these  issues  at  the  beginning  of  WTPT  development  will 
result  in  a  much  smoother  development. 
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ON  THE  CONTENT  AND  MEASUREMENT  VALIDITY 
OF  HANDS-ON  JOB  PERFORMANCE  TESTS 


PROBLEM 

The  justification  for  using  aptitude  tests  to  help  select  enlisted  recruits 
and  assign  them  to  occupational  specialties  is  that  aptitude  tests  are  valid 
predictors  of  performance.  The  aptitude  tests  used  by  the  military  services 
have  been  extensively  validated  as  predictors  of  performance  in  occupational 
specialty  training  courses.  Their  usefulness  as  predictors  of  performance  on 
the  job,  however,  is  less  well  documented.  The  Job  Performance  Measurement 
Project  has  been  initiated  to  validate  the  Armed  Services  Vocational  Aptitude 
Battery  (ASVAB)  as  a  predictor  of  job  performance. 

The  question  then  arises  of  how  job  performance  should  be  measured. 
The  measures  favored  by  the  Joint  Service  Job  Performance  Measurement 
Working  Group  are  hands-on  job  performance  tests.  These  tests  have  intrinsic 
validity  because  of  their  high  fidelity  to  job  behavior.  Hands-on  performance 
tests,  however,  are  susceptible  to  poor  content  and  measurement  validity. 

«  Poor  content  validity  may  arise  because  the  tests  focus  on  skills 
easy  to  test  in  the  hands-on  mode  without  including  the  full  range 
of  job  requirements. 

«  Poor  measurement  validity  may  arise  because  the  scoring 
standards  of  test  administrators  are  not  calibrated  and  because  test 
security  is  difficult  to  maintain  (examinees  can  find  out  what  is 
being  tested  and  practice  beforehand). 

The  purpose  of  this  analysis  is  to  examine  the  content  and  measurement 
validity  of  prototype  hands-on  tests  for  three  Marine  Corps  specialties  — 
Ground  Radio  Repair,  Automotive  Mechanic,  and  Infantry  Rifleman  — used  in 
a  feasibility  study  to  evaluate  ASVAB  qualification  standards. 

FINDINGS 

The  findings  pertain  to  the  two  technical  specialties,  Ground  Radio 
Repair  and  Automotive  Mechanic.  Because  the  infantry  riflemen  in  the 
sample  had  limited  job  experience,  the  results  for  them  are  inconclusive. 


•  Hands-on  test  scores  were  only  weakly  related  to  amount  of  job 
experience,  as  measured  by  months  in  the  Marine  Corps  (figure  I). 
Test  scores  were  expected  to  increase  with  experience,  and  the  lack 
of  relationship  raises  questions  about  how  well  the  tests  represent 
the  full  range  ofjob  requirements. 

®  The  ASVAB  is  a  valid  predictor  of  hands-on  test  scores  for  people 
with  2  years  or  less  of  service  in  the  Marine  Corps,  but  not  for 
people  with  more  than  2  years  of  service  (table  I). 

«  Hands-on  test  scores  did  increase  with  experience  for  people  with 
low  aptitude,  but  not  for  people  with  high  aptitude  (figure  II). 


rsaaio  repairers 
x  -  x  Auto  mechanics 


0  6  12  18  24  30  36  42  48  54  60 


Months  m  service 


FIG.!:  JOB  PERFORMANCE  RELATED  TO  TIME  IN  THE  MARINE  CORPS 


These  results  suggest  that  the  hands-on  test  content  was  appropriate  for 
people  recently  assigned  to  their  first  duty  station,  but  less  appropriate  for 
people  with  more  experience,  who  perform  job  tasks  not  reflected  in  the  tests. 


312 


The  findings  on  measurement  validity  bear  on  institutionalizing  of 
hands-on  job  performance  tests: 

•  The  test  administrators  used  different  scoring  standards,  and  the 
same  administrators  changed  their  scoring  standards  across  time 
(see  figure  HI). 

«  Maintaining  test  security  is  difficult. 

TABLE  i 


VALIDITY  OF  THE  ASVAB  FOR  PREDICTING 
HANDS-ON  TEST  SCORES 


Months  in 
service 

Validity3 

Number  of 

cases 

Ground  Radio  Repair 

15-25 

69 

38 

28-35 

nn 

vw 

53 

36-48 

00 

37 

Tota< 

37 

128 

Automotive  Mechanic 

2-14 

.72 

57 

15-25 

52 

56 

26-34 

15 

53 

35-60 

-  07 

54 

Total 

37 

220 

a  Population- 

■wide  estimate  of  validity  coefficient 

CONCLUSIONS 

©  The  ASVAB  is  a  valid  predictor  of  job  performance,  as  measured  by 
hands-on  tests. 

©  But  hands-on  tests  lack  robustness: 
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Content  validity  is  sensitive  to  job  experience. 


-  Measurement  validity  is  sensitive  to  the  calibration,  or 
scoring  standards,  of  test  administrators. 

•  Institutionalizing  hands-on  job  performance  tests  would  be 
difficult. 


Milton  H.  Maier 

Center  for  Naval  Analyses 

Alexandria,  Virginia  22302-0268 
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Administrator  1  -  1st  haif 
Administrator  1  -  2nd  half 
Administrator  2 


Hands-on  test  score 


FIG.  Ill:  HANDS-ON  TEST  SCORES  ASSIGNED  BY  TEST  ADMINISTRATORS 

TO  AUTOMOTIVE  MECHANICS 
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l !  t»-  Mi'-!  •  *  t-t‘  in  reaching  ttm  Navy’s  coal  to  improve  enl  ie- 

de'  icnne)  c  1  assi  (  ;  ra  t  i  on  throuqh  the  use  of  10b  performance 

'•<  t  i  ■  i  if  lode*-  ttip  rjp\ el  opment  ,<nd  evaluation  of  hands  -  on 
1 1  r  mar  H  •  •  Th.s  napt-j  desi.  r  3  bt-s  the  developmental  pro- 

i'.n  ■  -  i  -mp  1  i  <  -0  in  the  i  ir-it  i  atii  u  to  be  (overed  in  a  Conor 

••'I  i„-(  t  ...i .  i(  mt  Oer  v  *  c  e  pmiU’i'I  to  med  at  1  ini  ing  iob 

i  1  *  '  -n  i-  i  i  1  ■  *  1  i  p!  m--*-  it  -  !  an.  iai  >  Is  . 


Iv-ti  1  U(  C.UI  u) 

In  ion  .  (  i.'iiui  >- as  forma'  !  v  '  equit'r-d  tl"ie  Armed  Ser  vices  to  ee- 

'  r'i'l  i  “t  method .  t  O'  meeeu*  i  n<j  icb  performance  and  validating 

i  iim,  t  rndai  dr  against  them.  Ihe  Navy’s  coritr  lbution  to 

Mm-  i-.f  d  ina  i  i-d  effort  is  entitled  Per -f  or  mance-based  Personnel 

r  !  >  f  i  i  a  i  i  <  m  .  It’*  >  )b  i  •'(  t  i  .  os  ar  v  l  o  i  nvest  i  qat  e  measur  ement 

ippr  inr  tn*r  tf.at  i.  an  b<-->  used  to  assess  on-the  qob  per  for  mane  e  and 

m.  i  mpi  i.i  i.  t  tie.-  Navy’s  automated  (  !  a- >•.->  i  (  i  cat  l  on  and  assignment 

.  /  t.em  cn  Cl  AM  it  t  uf-l  (-r  and  Cm-,  f  ac*r  ,  ?9C~)  by  including  iob  per¬ 
il  ii  in'.ii  i 1  •  i  iii  i*r  ii\i  ■  t  i  i.ifi . 

rt.n  Na  ‘  '  anpr  oacti  (cruses  on  direct  measurement  of  tech- 
n  i  i  :<  I  prof  i  r  le'ii  v  ,  winch  follows  the  research  strategy  of  the 

J"inl  '  >i  'ii  t  ■  pi  r , ,  er  I  .  riii-  purpose  of  t  1 1  i  c>  large  scale  e  1  f  or  l  j  s 

i'  '  •  >-iuii  mb  pe*  inr  naiii  o  mt-u  ur  os  for  t  i  r  st  -term  enlistees  with 
fnin  <  i  f  r-wt-i  '  erirn  of  sei  v  i  u‘  aiiij  dt'iiionsti  ate  their  use  of, 
»  t  if.-*  i  -  t ,  .i  pi  eel  j  i  '  si  .f)  j  cj'tt  inn  mi  f  ice  uf  the  bee  r  e  t  ar  /  of 

IN  >  ei,'  r  .  )  Vi  14  >  . 

In  Mir  dot. it  '  er  .  i  t  e  i  t  I  t  he  hand‘:  on  »ob  sample  test  has 

ta-en  adofded  a 1  a  In  qh  fidelity  beni  fifiiai- f  measure  against  which 
j  t  lit-"  i  e  .a  .  o-l  I  v  and  mot  r  (vjm  I  /  aclrn  i  n  i  ‘-t  s>r  r  >d  measures  will  be 
micai  t  ii.  ilia  cloi  c,  it  i  <■  essential  to  const  rued  valid, 
i  '  •  I  i  at '  I  <  ■ .  ,  •  i  iii  iibjt"  1  !  v  e  bands  on  pen  f  or  mac  u.  e  tests. 

He.  -ii  ,if  t  Me-  etensi  ve  resources  i  eqco  r  f~d  tor  this  large 

->*  -de  -i  t  i  1 1,.,  I  j  .  'i  i  M  bn  t,  pei  f  oi  mane  e  measure.!  will  be  developed 

t.  n't  1  v  f  or  a  ■  iii  a  I  j  numb  er  <  *  f  Navy  i  a  t  i  o  u  s  .  I  hi  r *  Mac  1 1 1  n  i  s  t  s  Mate 

•  MM  )  r  ill  i  tr  |  i  <  tin  or  sl  out'  1  o  be  <  o  ,  t  -r  ed  . 
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i  joist  nt.t  f-  !  “•  tor  mane. t?  Test  Development 

Tit  .-i  a  '(  e  that  (lie  job  sample  test  lor  MM's  adequately 

I  f  in  flit'1  t  i-u  t  p  *-  technical  L'l'ut  !  1 1  pncv  c  on  t  en  t  d  oms  l  n  .  a  ssouence 

i  f,,-i  ',f.  l  er  t  i  "M  stf-E'S  were  taten  iPiiiori,  19/40.  These  steps, 

,,‘,i  i.  mi  .lived  Me  orderly  rw'uL  t  i  w.  i  of  the  job  content  uni  verse 
t  ,  >;,*  Mitt  —  itn ile.  resullt'd  in  the  identification  ut  the  critical 

i  ,4-h  Mi  Ik-  i  ■  ■  *  i'.Jed  ii'i  Hif-  test. 

, .  M,p  (  on  '..-id  nrii  versa  was  demined  by  a  c  ompr  ehei  isi  ve  job 
i  ..  i  -  .  a  i  **  j  ..  t  ci  t  en  f  rum  t  t  ie  Na  -  v  Uc.  c.  <  ip  a  t  j  i  *r  a  i  last  Analysis 

j-,  ,  i..ram  miJIAf-'  date  base.  She  latter  was  supp  1  ement  ed  with  in- 

i  i  v'.i  ,<t  j  ti'  Mom:  r  1  )  C  ic  u  tp  c' 1 1 1 )( *  a  1  Standards  :  l  J  I  F'er  sor  j  n  e  1 

i  , .  • .  >r  *  Fa*  •.  ti  i  i  reuiei  1 1  :  'FAR':  MM  Personnel  liual  l  f  i  cat  l  on 

•'tc’i  i.Jat  u >f  i '1*  ;  (4;  standard  opf?r  jtinci  pr  ocedui  e-.»;  (5)  stan- 

I I  .  i  ,i  i  t  i  •<  i  1 1 1  *>i  i  a r  i 1  *  -  O’  ocedur  e~- :  i  >  technical  manuals?  and  (71  A- 

-  r  i .  ,  - . !  1  .  *  a '  1 1 1  f  1 1 ,  id  jec fives. 

In,.  ,  p  .intent  domain  was  determined  bv  subject  matter 
■  -  ;.i*  r  ■_  (9^1:  ’  ludqments.  The  oriqinai  tasF  list  was  reduced  to 

,,  t  i  1,01..^  t  ,isF  s  mvol  vitiu  technical  pmuf  l  c  i  enc  v  for  MM's 
,  i,.  : . .  ,  e,  i  mm  1 '>=,*_  l  1  nss  friqat.es . 

ii,m  i  *  i  ..  tep  involved  the*  definition  of  the  test  content 

,,,i  i  h  i  n  uni-erse  consisted  of  those  job  tasF  s  from  the 

*  1  ,i ,  m  a  I  i.'  of  juencv  content  domain  that  might  be  observed  in 

,  I I  ^  i  t  (  ■-,+  .  jn  addition  to  all  conditions  that  might  be  i  m  - 

j  *,t-d  jn  thi-*  lestirn)  si 1  u  a  I  ion  and  the  procedures  for  observing 

...  ,.l  r  hi.  or  d  I  rill  responses.  At  tms  stage  bMEl  judgments  were 
u 1  i ,,,,  ,_ri  r,n  Ma  rriticalness  of  each  t  asF  to  the  operation  of  a 
d  i  1 1 "  -  i . r  i  pi  1 1  s  i  •  H i  plant. 

Filial  |  y ,  I  he  test  content  domain  was  defined.  TasF  s  were 
pi  awn  ft  oin  the  Lest  content  universe  based  on  the  t.asF  "Cr  1 1  l  cal 
1 1  idnnic-id  s  utd. one.!  in  the  pr  e<-  e  3  l  nu  step.  They  rover  the  two  main 
a  .  f  woi  i  hcdia  v  i  or  for  MM';.  named  v.  maintenance  and  watch-- 

-:>  I  a  i  "Jii  i  i.i .  1  he  procedure  ultimately  yielded  the  following  tasFs, 

wf  i'  h  i  •-!  t  apnruved  as  a  ’pr  eseritat  i  ve  job  sample  bv  the* 
MM  I  r  .  i  1 1  1 1  n  i  '  i  ilium  'in  •  v : 

A,  Ma  j  nt  enani  e  lasts 


I  I  armed 
(  j  ) 

i  j  i  ) 
(  3  ]  I  ) 

For  r  ec  t 

<  i  ) 
(ill 
( J  J  i  ) 


M.-u  nt  enani  e 
Ramp  1  e  c.il 
1 1  i‘-j.'e-<  i  oil 

1  a ci  out 

ve  Mai  id  enani  e 
Mat  e  gasF  e*t 
Repait  valve 
Rep  a'  I  Valve 
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L  .  U  •  >.  !  i  _■>  t  a 1  i  d  i  -  i q  1  i-i 

Nor  mal  Wat  cnst  and  l  rig 

'i  >  Sb l  <  t  cuij  inspect  Wdter  sir  diner 
•ii  >  Shift  niu!  inspect  oil  etreiner 

•  i  i  i  1  flpct  die  eductor 

i  /)  Oner  Ate  lire  pump 
•'  -  '  Reurd  'idUqf'  reediriq^- 
-a  ''ledM  end  inipect  oil  niter  or 
'  •  j  i  e  r  At  e  oi  1  pi  imp 

.  .  ■ '  a  s  u  u J  t  ..  Control 

in  t  oss  o-t  oi  1  pressure 
1 1  )  1  Ma  i  or  oi  1  1  eat 

*  i  j  t  >  I  uss  oh  vacuum 
<iv)  Hot  bea<- 1  no 

■  ■  <,  -  niMi',  ft  mu  the  t  asr  --.election  procedures  the  following 

1  •'  !.•■  tc  wt‘,11  —  was  f  o!  lowed.  Fir  si,  che  behavi  o'  a] 

i  . «  r  t  .h-u  wor  <-  ;  •  I*  a  ii  i  (  i  e<  j .  This  was  relatively  fa-v 

•.  "'ll  ■  i'-.ve  fullv  pr  r.i.Hdtn  tU  '  ;ed  iot)  per  f  or  mat  ice  aids  rjt.it 

*  tit.  •  !  ft  -  b  i  stop  t  trtju  i  r  omen*  S  oi  watt  hst  *■  xJl  Og  tasf  s. 

•  '  ml  steps  art-;-  outlined  in  the  Engineer  ino  Operational 

.  •  J~;J  ,  ciij  F>  n.j  i  near  » ng  Op  or  at  i  onnl  Casual 'y  Cc'd  ol 

t  co'iihi,  The  spec  i  f  i  cat  j  on  o-f  the  behavioral  elements 
1  'ii  M.t  >>n.-no  last  •-  were  obtained  from  fleet  CMC  ”  t>  and 

»  h  '!•*-  MM  trvinunu  community. 

'  i.'  I  •  i  w  j  •  n  i  l  tie  delineation  of  the  behavioral  elements 

■  a1  >''•  with  each  last  .  MM  '  s  aboard  two  -frigates  were  observed 

i  .  v  pur  f  ■  n  mod  I  he  lasts.  A  <-ow  lasts  were  eliminated  because 
i  h.  l  v.  I  i.i,-  oc  HiJnr  nl  standardisation  across  ships.  It  was 

i  '  I-.  i.mnt  i  tiat  the  sot  o-f  tasl  s  were  presented  to  the  MM 

ii  •  mi  i  mtiiti'  ini  l  v  f  oi  appt  rival  . 

i.i  i  o'  I  ho  per  f  oi  mance  of  the  taut  s  was  observed  aboard  the 

hi  i  ’  •  a  ■-  trur  t  ur  ed  obser  va  t  j  on  form  was  qener  atj->d  Draft 

i  i  nr  wore  reviewed  for  up  ui  ary  and  comtormity  to  coriven- 
■ ,  •  I  i.pi-i  -i  i  in-)  pr  oi  i-aJur  *-o  1 1  v  r-M[-  s .  Next  an  experimental  vt?r  • 

,  i.i  I  In.  ii"  trumeid  w.-a~  :\dm  i  n  i  st  er  ed  and  revised. 


i  ijn  Ti  t  a  I  /er  sj  on  w,r,  udmJ  ni  st  ei  ed  1  o  appren 
i  i.i  l  As  a  esu  l  t  of  this  My  out  some 

>  r  >-st.  wei  *■  -  ,1*  .  (Jno  of  our  d  i  s<  nvoi  i  os 
Hoimul  i  oh  spe<  uii  i.'ul  inn  e  vei  i  cm  thi<>  r 
wher  ea«  wo  fiad  e  ip-p'd  e<l  all  first  term 
J  a  t  J  of  I  lit  ■  Wtd  i  hut  ai  u 1 1  ni i  t  ac.l  s  i  < mi  ioi  I  ed 
oi  n  1 1  noer  i  iiu  spat  on.  I  h  i  s  lr'd  to  i  tie  (level 
fin-  thw  two  m.iior  Mo  wort  «.  peons  ;  one 
an  I  t  <r  1 1.-  lot  trio  Aiu.  i  1  j  ai  /  <  Goner  a  tor  )  h 

*.  i .  ii  *  i;  e  lasts  wore  developed  for  udminislr.d 
lin.i  (}v  veil  r.dtit-r  tleiu  on  board  U  ship. 


e  i  a  t 
MM  ' 


II M  '  s 
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t  hut 
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!  i  the 
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!  MiHii,.  -i  :-uit  1  nq  lev  wd':  prepared  in  whicn  each  Ijeluwiut  nl 
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Closer  ver  Training 


r  that  results  fi  om  hands-on  job  sample  test  into  r.u  e 
r-J.uh't.  e  amint-rs  must  be  I  rained  to  act  as  unl.ij  ased  observers 
ii*  +»-<—!  per  -f  or  mar  ice.  1h*;rel  or  a.  it  was  necessar  v  lo  develop  a 
*.  c«  i  r 1 1  1 1  Ci  pro  tr  for'  adiM  ■  u  st  r  a  t  or  s  and  scorers  of  the  tests. 

h  *i  riiHO'i  pac f age  using  a  rol«-plavmq  procedure  was 

.levi  J  med.  The  procedure  requires  that  each  person  to  be  trained 
!  -i  *■>»  turn  pi  avion  the  role  of  scorer,  examinee,  and  observer  , 
h  e  role  o*  obser  ver  di -Hers  from  that  o+  scorer  in  that  the 
■'.it  me  s'  jr  p-,  i  tie  test  but  does  not  otherwise  inter' "L  with  the 
a  annnu*'.  1  r,e  atioriale  undertvinq  this  procedure  was  the  expei 
).  i  (i  ■  a,  -r  nem  e  c»f  scorer  behavior  as  scorers  anu  observers 
•niTirui  n  ind  discuss  points  at  wtncli  their  observ  *  ions  and 

-1 1  *  1  O'  il  i  vac  <  ]e . 

I  w'i  *  tn  ae-member  teams  (four  <  nnf  rartnr  and  two  Nt'DDr  person- 
i  ud  )  pd*  1. 1  c  inated  in  the?  training  program  on  beard  several  ban 
0  l  or,  1 1  bdS‘"i  ships.  The  first  training  day  was  devoted  to  orien- 
ration  and  1  tie  second  to  re-vj  owing  s  tandar  di at  i  on  procedures, 
a'  I't.i  m  1  r  at  i  on  instructions  arid  scoring  processes.  The  next  six 
d  i--s  wt  i  n  spent  in  role  playing  and  discussion  sessions  on  board 
?  fi  m.il-o  Hie  I  raining  steps  were  repeated  -for  all  l  ob  tasf  s  at 
Pei  1 1  wort  stations  as  well  as  at  the-  pierside  mamtamance  test 

o'  -I  t  I  I  ll'l  . 


Out  a  (To  l  I  ec  t  i  on 

ip.1  i  in.']  i  «  i  I'rrtntl  ■'  in  progress  and  is  being  done  under 

.  in  1 1  ic  t  witti  a  team  of  four  test  observers,  all  of  whom  are 

(>.i  iffT  niTs.  t'uch  has  appr  om  matel  v  twenty  years  of  experience 
pmI  ha.  :il  I  iii  nc'd  Hie  minimum  pavarade  of  t/.  T 1 1  r  ■  remainder  of 

•no  team,  who  provide  periodic  quality  control  che<  \  s,  consists 

-g  t  wo  in  i  1  i  t  a  /  memtier  s  of  NPKDb. 


ill  I  i  iiic'l  *jl  .  we  wi  I  I  have  <><  ores  on  the  hands-on  t  «4st  for 
'no  f  m  -t  -term  MM’ s  who  at  e  serving  aboard  twenty  eight 
I  ,:><,>  fr  ui.il  i"  ,  these  ships  are  I  ex  ated  at  seven  test 
in'ln din. i  '-b,n  Diego,  long,  Deach,  t  Thar  J  es  t  on  ,  Nor  foil, 
llai  t "  H  .  'luwnor  l  k )  ,  and  Maypor  t  t-L  . 


atii.u  it 
J  i  >b'.' 
Si  t  es 
f'ear  ] 
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i  t-r  »"r  ii  tdH'*'  lc?>’r  l  /ci]  u-xt  ]  cn 


■■ '.-.r  <-«ua!/ses  will  be  applied  to  the  dut  r-,  col  letti-d. 

n,eso  m  1  1  ji.(  l  ude  the  (lf.ji'  i  vat  i  oil  ot  distributional  char  a<  t  er  i  <- - 
f  ,  .,1,(1  .  if-,,  r  lulivr  l  l  o(ii  dnaly^t'S,  end  +  he  <  dluil,?- 

(j,,.,  Of  roll  lb]  li  tv  r.nd  vdl  itliU  indices.  Initially,  we 

,  t .  i  ,  i  1 1 , ,  o  ore -it  deal  of  emphasis  ori  met  Ik  kJo  1  u  v  arising  ti  cun 
fj.  r,*-t  <1  i  ub  i  1  i  t  v  !  heor  ... 

t  ,.i  <  o  i  in,  >  iiieH-ur  ef  muv  ..onluiin  the  the  seme  sour  <  *’<■>  o  + 
-,c  t,  .iditional  gaper  and  pencil  measures;  namely. 

. . . bill  tv  of  r  esnonsc-s  from  one  occasion  to  another.  non- 


,,  ,  ,jon,-o  i.f  s"Oposedlv  parallel  Mir  ms,  arid  hetpr  uuenenue  sob  - 


i  ,  ; 

Gei  ier 

al  i  ah  i  1 

it/  1 heor y  < Ghav  e 1 

SO!  1 

and 

Webb  , 

i  -n i  - 

will  he  used  t  o 

es l  i m  a  1 e 

the  magnitude  of 

ear  h 

Cj  f 

t  hi»f.fr 

i-d  i  ‘  ‘ 

,  d«r  ,  !  I.rl 

i  v  l  du <a 

1 1 v  and 

i  r ■  c  ( ,m b  i  not  i  (  ms . 

c  * 

'  .  i  ♦<--*'  -  <it  [  h  ► • 

h  unds 

-ei ;  t  est 

mav  result,  on  the 

-  one 

bcafld 

t  r  om 

L>  '  S 

i  *  *-  •  *  ^  <  n  « tut  p  • 

•-ill 

1  .  and 

ab  l  i  i  t  v  f  <-\<  t  or  s  ai 

kJ  on 

1 1  le 

ot  her 

i ,  .ft  j  t  i  .  i>  i  . .  ti  hiiOimi'j  f  I'c*!  ui  f‘y  of  tlio  t  es  t  i  1 1  o  situation.  I  h  ese 


,  iiici-  i  ef  ln:l  differences  in  MM  job  e::pe>r  i  one  e ,  diffeien 

:  ,  i,  i  a.<-  i  i  1 1 - ,  r  omi)  t  i  nn<-  ,  and  dj  f  f  prem  es  i  n  MM  goal  i  f  i  c  at  i  oils 

'  ,-  -  i  in-  < 1 1  *  ♦  ♦  nt  t  e*  t  i  ng  lorationn. 

i  <  ,Ti;u  i  mi  e.tr  aneems  v,jri  -ucp  sources  e,.pJ  jc.j  t  through  a 
n,  .i  j  ,  ,h  i  1  i  t  f  r  d.nrwoi-  t  ,  it  is  possible  to  identify  whei  e 
(,i  i  he  i  i  i  f  ei  j  on  measur  .•‘iin-'i  1 1  is  occur  r  i  no.  f'onseyuen  1 1  v. 
i  i ;  ( '  •  it  1 1 ,  i~,  k  O  i  .  ,  I  o  i  i  I  y  f  r  hiiipwi  ,i  1  r  .-in  .-at  .-t  -s  a  (  on  c  e  Ld  u  a  1  oil  l  -Jr1  l  O 

,  <  ,h  no  i<»r  man'  e  r  c-?st?ur  t"h  and  i  hi  reaso  awareness  of  potent  i  al 

i  e  <  i„.  I1  at  need  to  he  <  oid  rolled  or  monitored  <Jur  i  nq  test 
<  id  re  i  r ,  i  -  I  r  a  t  I  '  it  ■ . 

II, o  mon  i|uo<- 1  i  on*,  of  interest  within  the  content  of  a  gene 
I  , ,  I  •  ,1 , 1  j  I  |.  /  design  involve  different  possible  ‘..our  c  es  of  error 
,, ,  ,r  i  , ..  ii,o  following  sour  c  es  a|,pe,.n'  mo1  t  p  l  air  - 1  L>  1  <->  as  i‘diu 
lii’ii,-  ,  i  I.  on  i  ta  l.<  >r  <  : 


I  ,  Amoi ud  of  e-; per  i  ere  i 1 

’ ,  1  vpe  <  d  w.d  r  | ,  s t  at  \  on 

type  of  emti  y  level  training 
4.  lev  alj  on  of  duly  station 
'"( ot.de  of  e< |u  i  pnen  t 
<s  lh  si  rued  ions  dur  i  rm  testing 
I  o.  (  oqu  i  pmerd  sour  <  e 

II  { w  or  or  d  j  f  f  oi  eji rc:es 

n,f,  no  u,o  f  rii  o|  ,,  of  I  1 1  r 1  i  -|  c>  1 1  e '  r  a  1  i  ,’ub  i  1  i  t  v  design  for  wlui  li 

, , a  i-  i  , ,  1 1  (  ,  -  ,-a, ,  i  i  tn-d  es  will  made,, 

1. 1,  >  , ,  i  i  ■  hopi  >  f  1 1 1  l  1 1  <  1 1  we  1 1  i-i  v  f '  i  oi  is  I  i  is  t  i-'i  1  a  si  1 1  f  l  i  i  ei  1 1  L  v  p  <  ir  a  I 

;  ,  I  t  -  1  i  1 1  o  l  will  ho  pI  >  1 1  n  I  i  v  i  d  y  ol  )ser  ved  so  that  r>o<  ir  t  o*  >  of 

oi  |  ,|  (1  i  .  1 1  t  >  f  i  i  mi  I  ho  ,  (dm  i  n  i  ' .  I  i  -  ,1  ion  i )  f  Itir-  i  ns  (  r  i  uiiei  1 1  w  i  1  I  be 

• , ,  ,  1 1  i  ,  j  i  | ,  j  (  ,  |  , ,  o  f  j  <  j  1. 1  'i  1 1  i  c-t  I  si  u  n  c  e‘-  of  or  i  or  var  i  uihs  '  assc'i  )  a  I  et  I 

i  M  i  <  I ,  i  i  1 1 1, 1  o» |u  i  pmeu  l  i  om, 1 1  1 1  lo  ho  ev<d  is 1 1  ed  „ 
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'  >1  KIWI  ;lf  V 


W.,  •  t  Ti'l  I.jncls-C'n  Dt*f  I  nrau'i't  c  test  3  s  (  he  fit  st  unp  to  b& 

'j^vpl  i'i'K.1  wi  t  I'm  the  N«av',''«  ;oh  ppr  f  ir 'nance  r  =  <  1 1  effort, 

nit  nMF  ’  u  npipctnl  the  net  of  o  j  t  i  ■  at  ta?>l  s  to  a^sesi'  tec.hru  - 
r .  i !  pi  .n  i.  mil  .  (.o  i  ii  t  I  pi  iT>  lihl '  .  <n  hands  on  test  was  (jt’\  h)  upcd. 

r he  ’Ii  si  ii\H  v  of  ihinr  ditil  iot'  srifC  mlii'ijtioc  during  test 
dt  ■  •  !  <  n  K>,.  1 1 1  led  i  •  i  1  lie  t  O'  )■-.  I  i  u<  I  1 1  n  i  ot-  par  <-tl  l  e  j  test  (or  ms  (or  a 
'  i> ,  iil.'r-.  ( ’  f  i  r  I '  1  asi  A  It  aining  r.iai  I  ant’  (or  L  t.“- 1  observers  was 


1  ,  p  If-'.l 

HI  !  Wr  »  ' 

lisp'  1  1  t 

i  edm  e  tin--  v  c.i  r  i  at  i  cm  in 

test.  c.cJmi  ni  <»t  r 

n  .rid 

•  i  nt  j 

‘  „ 

Id  .iii'iit  •>  of  the  test  observers  has  been  <  oinpleted  arid  dal  a 
i  .A  in  (  urn  1 1.-  -  be-inn  on  I  h-:  Hi  o  - 1  est.  ships  with  two  observer's  at 

e  i,  '•  d  dn  o  1  c-\,1  -d.d  ton-:,.  In  order  to  Drither  dat  a  (nun  '  ‘  >rp 

riM"  -  '."i  •_tii"‘.  )i  >  differ  (.-id  hnmepor  t  =  we  wi  )  I  be  testing  for 

4  '  I'l  1 1  '>->  /  •  >i  H 

‘  in  ■!  iv,  ■  •>  nor  ,  1 1  i  /  .tb  i  t  i  t  v  iik  nle I  < >  will  he  used  to  e-:>  t  i  in  a  t  e 
i  , ,  i  1 1 1  i  a  i  t.  i  i 'i  >5('oi  s  3  at  oil  wj  t  ti  hands  on  tests  in  the 

...  .  i  ..I  i  .•■Hide.  II  ■-<-£  sour  c  e -i  wi  It  include  nfnspr  vpc  diScmr  tn*- 

-in  .  *  . .  mnaliTH  ->  of  n.-rallej  f  ( »r  in<~  unst  nndi^r  (J  i  »•  »?d  test  i  no 

,  ii  id  i  1  •  ,M  i  'iv-  ■  liu"i.  .'nil  I  hi1  I  di  I  o(  t  e:  t  ‘.pi  ui  i  tv.  the 

r  i  I  ■  'it  these  analyses  will  lie  used  to  qiu  do  subseijuen  t 


kef  er  on'  no 

full  hi  I  .  !vl  (April  19/9)  ..  I 'r  i  no  j  p  I  os  of  wort  sample  U.m  L  im  i : 
II  I  i  .iiisl  i  in  I  inn  and  I'Vriliiiil  inn  of  wor  t  sami )  t  e  toot  «->  ( Ak  \  I  It’ 

" ‘I  id"'  .  Ali>  iihilr  i  a:  Ar  my  F < i ' •  ear  c  1 1  Institute. 

I.  ""Ini  ,  I  „  l’„„  b  Unfair.  Ii„  A„  (Noverihoi  19Ho.  01  aoo  i  >'  l  c  at  i  on 

.i-ii  I  m'U.  i  ni  •  <ti  .  -■  i  d  within  I" '  F  !  I'l.  UI.AHh):  A  rucruil  asai  qnmen  t  model 
‘'Ml  I  •  I  >t  ’  I  e,  1 1  ,  I ''I'l . .  Od  -91,  ban  Pieqo:  Maw  I  'oi  1  ><  inrif1 1  Upseni  <  h  and 
be  n  I  op  .ni  'id  1  i'ntOi  ..  (AD  A)  V:  Vo/> 

i )  f  f  i  ■  ‘  ,  i  f  lh"  As'  o  laid  ( >ei  r  el  nr  y  of  Do  f  ease  ( lvi ,  I  ,  b  I  )  „ 

'  ftia  <  .ml  .  I 'AM),  Joint  ‘"'or  y  i  '  e  ot  furls  to  lint  I  n  1  i  s  tmon  t  Stan 
.  I  ,i  '  I'  i  o  ,  I  ,  I'm  f  i  n  in -mi  i  I  Is  i  r  d  Anna  a  I  Uepm  t  lo  I  I  u-'  Ifoa 
.  i  mmi.  i  i  i  '  -i  i’  di  |.M  iiiif  i  ,d  :  'in  Wash  i  mi  t  on  ,  0 

Mia  ■  1  ,"i.  1  ■  ■  (  -<nd  W <  -I'li .  Id  ,  i"l .  ini  ier  a  I  i  r  <.ih  i  I  i  t  y  t  I  se<  r  y :  I  9  /  '. 

I'-'!').  [  <r  i  i  i  si  i  .I'.'Mi  n,d  "  t  lla  I  tmma  t  j  <  a  i  and  It  a  t  j  o  t  i  «.  a  J 

l  i .  . '  .  ’  1  :  I  ’  '  I  A6„  I  9p  I  . 


'  'Mi  i  \I  I  < -i  \\1>  -i'\  I  i  1  j\>, 
i  ok  \  U  i  M  V  HIM  '  I1'  MM  I  - 

F ■  ■ r > t ■ :  ’  1  ;  t,  «  : ■  - 1  \.  luiri'i 
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1  !<  1  U  1 1 

pt> 

t  <  i  "*>  *'  *'»  . 

sue  b.  s,  i  -  p  !  e  -  often 

:  e  !  i 

s’,  S  —  ill  o 

f  1  ’  f  e  1  e 

r.cnt 

X  .it  j)* 

*  r 

toi  ".an.  e  t  h  a  a  pel  -  .n 

f  m  t 

;n> i  nat at  a  1  ! 

V  ,  .  Ill 

(  ii.ii' 

IVllMU  \ 

t 

111*  ot  ho  I  1  !  MU  I"  CV  J  . 

t  r  • 

>b  s  i  •  p  h'  '  i  1 1 

isisi  ..nl 

<  •! 

'-'l  J,'  t ,r 

•t 

s  o t  las’.-,  r  ii  !,  -e  "  i 

;rr !  or  -  ith  f . 


;it 

ft 


\!i-t  r  u  1 1’  I  |'il/  -  rip !  i>- .  on  tin-  c  t  he  i  hand,  <ne  i  opr  e -<>  nt -it  i  ons  of  a 
pit,  e-v  in  i  ot  per  t  or  mtu  c  that  haw*  boon  d  l  sassix.  l  at  ed  I  t  ot'i  M 

i'  l  ('-it  -it  -  that  naturallv  ai  <  ompanv  t  (i(*n.  "Ihe  ippear  uu  e  of  the  abstracted 
i  (iniiii  ,  t  not  oi  ore ,  is  inevitable  alien'd,  sometimes  wi  v  but 

-"r\  t  inis  radnallv.  \n  abstracted  sample  almost  ,tlua\s  looks  more  like1  a 
to-t  than  i  ]nty  at  t  l v  1 1 v  ,  v. net  her  oi  not  it  is  r  o<  ou  n  l /at)  1 1*  in  U'rms  ot  its 
p  it  ent  task. 


\n  abstracted  p,t)  sample  can  bo  dori\od  from  anv  <  omponont  ot  process 
1 1 1  task  pi  r  I  or  .mule  e  th.it  d  i  sc.  r  i  i.i  mat  os  amoiiu  per  f  ortnor  s ,  regardless  of  whether 
the  result  rtu;  measure  has  a  "hands-on,"  jobltke  uppoaiuni  o.  An  abstracted 
simple  1 1 a \  measure  skill  or  pfi’.sical  abilitv  or  job  knovMedqo.  holdernu; 
skill  requited  to  repair  electronic  equipment  has  been  evaluated  b\  havnm 
person-  "tike  s  ihier  pants  on  a  "br  (>adboai  cl" ;  one  of  the  plnsiial  requirements 
tor  sanitition  u<  r  ket  s  and  fireriifn  has  been  tested  bv  having  them  lilt  and 
uirr  .  heavv  vveniht  I’cMiaps  the  most  familiar  and  frequent  1\  used  abstracted 
jib  -inple  is  the  multiple-choice  test  if  job  information. 

I  or  evaluatin'*  job  pro!  i  c  i  one  ,  abstracted  measures  of  job  knouledite 
arc-  u  -  in  1  1  ,  pt  el  et  r  e.l  to  d  1 1  er  t  measures  be'  a  use  of  t  lie  1 1  of  f  i  <  l  eric  \  .  It 

is  eeii  et  1 1  1  \  ur  c  opt  ed  tbit  f  oi  nanv  l  ask  s  a  <  ap.ibi  1  1 1  v  to  per  i  orn  can 

I'Ms.m  ihl  ,  be  Intel  led  it  t  tie  acquisition  and  retention  of  task  knowledge 
ha  v  "  In  oi 1  i  tern  ins'  i  1 1  e.l .  Yet  ,  d  i  ran  t  neasin  e  - ,  despite  then  l  ne  f  I  l  c  i  enc  v  . 

-lilt  !  i  I  \o  ■  >  '  i  n  s  1  '  !e  1  a  h  1  e  o  J 1  ]'('•  I  I  hi  ■<  ,  1 1 1  si  ■  I  I  h  I  '  Use  avoids  tile  Uni  1  V  s  1  s  and 

1 1  ;  ci  at  i  on  1 1 1  pet  t  o  r  ,ianc  i  inherent  t  n  dee  e  1  op  l  no  an  abst  i .  k  t  ed  simp  I  e .  In 
t  in  pt  M(  .  i  i  i  t  in  ,kil  inti  >i  task  i  o  an  abst  i  .e  t  ed  sii?|i  If,  d  l  sc  t  i  i,i  i  na  t  i  up 

r  (  qii  1  1  ei.ient  >  i:  a  v  in  lost  and  aitit'i  i.il  t  equ  i  i  ement  s  must  penet  a  1' 1.  v  1"' 

1  ill  1  "iin  ed  . 
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The  examinee  was  required  both  to  describe  actions  he  would  Lake  and 
to  mark  the  particular  equipment  components  on  the  photographs  that  he  would 
observe  and  manipulate.  The  photographs  had  been  take.i  with  a  wide-angle 
lens  to  capture  both  the  equipment  relevant  to  a  given  task  and  a  good  deal 
of  irrelevant  equipment  in  the  surround.  The  pictured  equipment  provided 
both  a  context  tor  recall  of  task  procedures  and  distractors  that  could  elicit 
incorrect  responses. 

In  the  course  of  developing  the  abstracted  job  measure,  it  was  discovered 
that  the  appearance  and  sometimes  the  makeup  and  location  of  equipment  varied 
among  ships  ol  even  the  same  type  and  class  (Knox  class  frigates)  for  which 
the  test  was  developed.  Such  variability  in  ship  construction  is  apparently 
common  when  ships  are  built  or  overhauled  at  different  Limes,  by  different 
manufacturers,  in  different  shipyards.  (The  arrangement  of  equipment  on  a 
particular  ship  may  sometimes  be  documented  by  ad  hoc  survey,  but  we  arc  not 
aware  of  any  compilation  of  this  information  across  ships.) 

The  result,  of  course,  is  that  examinees  will  vary  in  familiarity  with 
the  pictures  of  equipment  appearing  on  such  a  test  taken  on  a  particular  ship, 
sole!)  as  a  consequence  of  their  experiences  on  other  ships  with  differently 
configured  equipment.  The  impact  of  such  variation  on  test  performance  is 
likely  to  be  greatest  among  apprentice  job  incumbents,  for  whom  the  present 
test  was  intended,  since  such  persons  are  least  familiar  with  such  variation. 

V  ice,  by  contrast,  that  these  variations  among  ships  do  not  necessarily 
cause  problems  in  measuring  proficiency  by  direct  job  sample.  In  the  latter, 
variations  in  equipment  and  work  materials  m  the  normal  work  setting  need 
not  lead  to  differences  among  examinees  in  familiarity  with  a  particular  set 
of  equipment  selected  to  be  represented  on  the  test.  Direct  job  sample  tests 
are  often  administered  in  a  person's  own  work  situation  and  use  the  very  same 
equipment  and  materials  that  he  or  she  works  with  on  the  job,  and  standardized 
score  sheets  for  recording  performance  can  be  prepared  in  a  general  enough 
wav  to  be  independent  of  any  variations  in  equipment  appearance  in  the  dif¬ 
ferent  work  settings  where  the  test  is  administered. 

If  an  abstracted  measure,  on  the  other  hand,  attempts  Lo  faithfully  render 
shipboard  equipment,  as  bv  photograph,  the  features  of  the  equipment  must 
necesserilv  be  those  of  particular  machinery  on  a  particular  ship.  Although 
the  consequences  of  differences  in  equipment  have  not  been  investigated  here, 
the  question  of  test  fairness  naturally  arises  when  some  examinees  are 
presented  with  familiar  materials  and  others  are  not. 

1  he  effects  of  such  situational  variations  on  abstracted  measurement 
can  be  avoided  bv  (!)  generalizing  the  abstracted  measure  so  that  situational 
variation  is  irrelevant  or  (2)  restricting  the  abstracted  measure  solely  to 
tasks  that  use  equipment  and  materials  common  to  all  work  settings.  It  may 
be,  of  rota  so,  that  neither  of  these  options  is  acceptable  for  making  valid 
inferences  about  proficiency  and  that  only  a  direct  job  sample  may  remain 
v  1  a  b  i  o . 
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Inter -Service  .master  ct 
Job  Performance  Measurement  Technology 

Cap*-  Jack  L.  Blackhurst,  'JSAF 
Air  Force  Human  Resources  Laboratory 
3rooks  Air  Force  3ase,  Texas  73235-5601 

Herbert  George  Baker,  PhD 
Navy  Personnel  Research  and  Development  Center 
San  Diego,  California  D2152-6SC0 

The  Department  of  Defense  is  coordinating  a  Joint -Service  Joe  Perform¬ 
ance  Measurement  (JPM)  Project  in  response  to  a  congressional  mandate  to 
establish  linkages  Between  job  performance  and  enlistment  standards.  The 
oo]ect ives  of  this  Joint-Service  Project  are  to:  (1)  develop  prototype 

met  r.odologies  for  tne  measurement  of  job  performance;  and  (2)  if  feasible, 
link  enlistment  standards  to  on-the-job  performance.  The  overall  program 
will  develop  measurement  techniques  for  the  collection  of  valid,  acc-rate, 
and  reliable  hands-on  job  performance  information  that  can  be  related  to 
recruit  capabilities.  These  measures,  in  turn,  will  be  used  as  benenmarks 
against  which  surrogate  indices  of  performance  (less  expensive,  easier  to 
administer  tests  and/or  existing  performance  information)  will  be  evaluated 
as  suostitutes  for  the  more  expensive,  labor  intensive,  hands-on  performance 
measures.  The  long-term  goal  of  the  research  and  development  program  is  for 
each  Service  to  establish  an  operational  performance  measurement  program  so 
that  job  performance  data  will  be  available  for  use  in  evaluating  personnel 
and  training  policies  and  practices. 

Each  of  the  Services  is  responsible  for  developing  performs-  _  measure¬ 
ment  technologies  on  occupational  specialties  wmch  are-  comparable  across 
Services.  This  approach  permits  the  Services  to  share  the  technology  for 
similar  specialties.  A  part  of  the  basic  strategy  of  the  Joint -Service  J PI  1 
project  is  to  determine  if  the  technologies  developed  by  one  Service  can  be 
utilised  successfully  in  other  Services.  The  Air  Force  is  responsible  for 
taking  tne  lead  in  inter-Service  technology  transfer.  This  super  will  dis¬ 
cuss  tne  transfer  to  the  Navy  of  a  technology  developed  by  the  Air  Force. 

Air  Force  Testing  Technology 

The  Air  Force  has  developed  a  conprenensive  performance  assessment  sys¬ 
tem  for  “ne  Jet  Engine  Mechanic  Specialty  using  the  combination  of  Walk- 
Through  Performance  Testing  (WTPT),  rating  forms,  and  related  question¬ 
naires.  WTPT  is  a  task-level  job  performance  measurement  system  that  com¬ 
bines  nandc-on  task  performance  and  interview  procedures  to  provide  a  high 
fideli:/  measure  of  an  individual’s  technical  job  competence.  The  hands-on 
component  resembles  a  traditional  work  cample  designed  ro  measure  perform¬ 
ance  on  i  sample  of  tasks  that  have  survived  the  imposition  of  essential 
measurement  constraints  s-ch  as  testing  time/cost  or  risk  of  personal 
injury /equipment  lanuge.  The  interview  component  has  been  added  as  a  means 
of  assessing  t  lose  tasks  that  would  have  been  eliminated  because  of  rtiese 
constraints.  Interview  tearing  takes  place  in  the  work  setting  and  requires 
f'-e  evaluator  to  assess  an  incumbent's  proficiency  on  a  tas)  by  asking  ques¬ 
t-ions  iengned  “o  uncover  knowledge  and  procedural  strengths  anu  veakne.rcr. 
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relate.]  to  t're  performance  of  that  task.  The  mcumoent  can  answer  tre  ques¬ 
tions  by  a  combination  of  vernal  responses,  gest-res,  and  Jencnst rat  ion. 
Toe  interview  testing  component  will  se  evaluated  ootr.  as  a  more  cost-effec¬ 
tive  sarrogate  and  supplement  to  nands-on  measures. 

In  addition  to  the  NTPT,  tr.e  Air  Force  r.as  developed  a  wide  range  of 
rating  forms  as  potential  surrogate  ]ob  performance  measures.  Tr.ese  include 
peer,  supervisor,  and  self  ratings  at  four  different  levels  of  measurement 
specificity:  task,  dimension,  global,  and  Air  Force-wide.  In  addition,  a 

set  of  questionnaires  were  also  developed  to  assess  goo  experience  and  level 
of  motivation.  A  detailed  discussion  of  the  development  of  tne  Air  Force 
performance  assessment  system  can  se  fo-nd  m  Gould  and  hedge  (1183)  and 
Hedge  ( 1334) . 

Results  tc  Date 

Tre  Air  Force  developed  performance  measures  for  tre  get  engine  mecnanic 
(AFS  426X2)  on  the  tr.ree  most  representative  engine  types  (J-79,  J-57, 

TF-33)  c-rrently  used  sy  tne  Air  Force.  The  Air  Force,  Navy,  and  Marines 
all  -se  the  J-79  engine  m  F-4  (fighter)  aircraft  and  have  first-term  jet 
engine  mechanics  who  maintain  this  engine.  The  Air  Force  and  Navy  began 
discussions  in  the  Sumner  of  1984  regarding  the  feasibility  of  transferring 
the  performance  measures  developed  for  the  Air  Force  J-79  get  engine 
-’ecnanic  to  me  ..avy  J-79  mecnanic.  Inese  discussions  led  to  a  transfer 
plan  outlining  individual  Service  responsiblit ies  and  a  three-phase  effort 
to  transfer  tne  technology.  Phase  I  was  a  feasibility  study  to  determine  if 
the  transfer  could  successfully  be  accomplished.  Phase  II  was  the  modifica¬ 
tion  of  tne  instruments  and  procedures,  including  a  pilot  test.  Phase  III 
is  tne  actual  data  collection  and  analysis.  The  Air  Force  has  responsibil¬ 
ity  for  Phases  I  and  II,  with  the  Navy  responsible  for  Phase  III. 

To  begin  the  effort,  representatives  from  both  Service  research  labora¬ 
tories,  contractor  personnel,  Navy  training  personnel,  and  two  Nar.ne  J-79 
get  engine  mechanic  experts  attended  a  workshop  at  the  Air  Force  nunan 
Resources  laboratory  in  April  1935.  The  workshop  laid  the  ground  work  for 
the  completion  of  the  task.  The  Narine  subgect  matter  experts  (SNEs) 
reviewed  tne  Air  Force  instruments  and  test  procedures  to  determine  the 
extent  of  cnange  needed.  Their  review  indicated  that  the  measures  could  be 
transferred  with  minimal  modifications.  Following  t'r  "  workshop,  several 
field  ,-isits  witn  SMEs  at  Navy  sites  were  made  to  confirm  the  necessary 
changes,  as  well  as  to  e*.  ,ine  the  feasibility  of  testing  Navy  mechanics. 
In  audition,  appropriate  Navy  documents  (Occupational  Survey  Report,  Train¬ 
ing  Outlines,  etc.)  were  reviewed  to  ensure  that  the  racks  represented  in 
the  Air  Force  tests  were  appropriate  for  testing  Navy  and  Narine  personnel. 
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A  second  workshop  was  held  in  July  1)35  at  the  Navy  Personnel  Research 
and  Development  Center  to  receive  the  contractor’s  report  on  tr.e  feasibility 
study,  and  to  determine  if  the  effort  should  continue.  The  report  indicated 
that  tne  transfer  of  performance  measurement  technology  from  tne  Air  Force 
to  the  Navy  an]  Jiar.ne  Corps  was  feasiole.  The  study  found  that  most  of  the 
tasks  m  the  Air  Force  instruments  could  be  used  with  only  minor  modifica¬ 
tions  and  that  only  two  of  the  tasks  were  not  performed  by  wavy  or  Narine 


328 


\  '  s'"')***  h"v*  Vv<*  V*/  V  V  *  *  *„  A  /•,  s  '  V  A  * 

•r  **  ******  * p  *-*!•»*  •  ■*  f.  y,  .  x 


personnel  due  to  an  equipment  difference.  For  example,  the  Air  Force  J-79 
engine  requires  the  installation  of  a  starter;  however,  the  IJavy  or  Marine 
J-79  engine  uses  an  air  start  and  does  not  have  a  starter.  Therefore,  two 
tasks  related  to  tne  starter  were  deleted  fron  the  performance  rest.  T':ie 
rating  forms  and  experience  and  motivation  questionnaires  required  only 
miner  Service  terminology  changes  and  the  test  logistics  did  not  appear  to 
oe  a  problem.  A  decision  was  made  to  continue  the  technology  transfer 
ef  f  ort . 

However,  a  major  finding  of  the  feasibility  study  was  that  the  Navy  is 
phasing  out  the  J-79  engine,  which  affects  the  number  of  first  term  ..avy 
mechanics  that  will  be  available  for  testing.  Because  of  the  limited  number 
of  Navy  incumbents  available,  the  Navy  suggested  the  inclusion  of  Marine 
J-79  ]et  engine  mechanics  since,  in  addition  to  undergoing  identical  skill 
training,  they  often  work  side  by  side  with  Navy  yet  engine  mechanics  on  the 
sane  engines.  Thus,  approval  has  been  requested  to  collect  data  on  Marines, 
transforming  this  study  info  a  Tri-Service  JPM  technology  transfer  effort. 

Another  recommendation  from  the  feasibility  study  was  to  incorporate  the 
development  of  Navy  yob  knowledge  tests  into  the  study  '  sign.  The  Navy's 
yob  knowledge  test  is  different  than  the  typical  k.nowleu3e  test  in  that  it 
makes  extensive  use  of  photographs  that  the  incumbent  can  use  to  answer 
questions.  The  photographs  enable  the  incu'~'~ent  to  reference  the  equipment, 
forms,  etc.  that  he/she  would  normally  use  on  the  job.  Inclusion  of  this 
additional  measurement  technique  would  allow  direct  comparison  of  surrogate 
performance  measures  developed  by  different  Services  on  the  same  sample  of 
incumbents,  something  which  is  currently  nor  available  in  the  Joint -Service 
proyect.  The  recommendation  was  adopted  and  the  measures  will  be  developed 
for  administration  with  the  other  performance  tests. 

Progress  has  been  made  in  the  study.  The  instruments  have  been  developd 
or  modified  as  needed  and  are  ready  to  be  pilot  tested.  Following  the  pixot 
test,  the  measures  will  be  administered  to  approximately  100  Navy  and  Marine 
yet  engine  mechanics,  using  test  administration  procedures  similar  to  those 
used  by  the  Air  Force  to  collect  data  on  their  jet  engine  mechanics. 
Pesults  of  the  data  analyses  will  be  available  by  the  Fall  of  1986. 

Research  Benefits 


This  technology  transfer  effort  will  have  four  major  research  benefits: 
(1)  This  study  will  serve  as  a  prototype  for  future  attempts  to  transfer 
performance  measurement  technologies.  Experience  gained  from  this  effort 
will  be  invaluable  to  any  future  inter-Service  performance  measurement  tech¬ 
nology  transfers.  (2)  This  is  the  first  attempt  at  direct  comparisons  of 
surrogate  techniques,  thus  underscoring  the  yoint-Service  nature  of  this 
project.  Such  comparison  of  useful  Service  surrogates  provides  an  expanded 
assessment  of  more  cost-effective  performance  measures.  The  primary  surro¬ 
gate  measures  from  two  Services  will  be  directly  comparable  on  the  same 
sample  of  yob  incumbents.  (3)  It  allows  tne  Services  ti  gather  performance 
information  on  additional  specialties  at  significant  cost  savings  because 
much  of  the  design  work  has  been  completed.  As  a  result,  the  transfer  can 
occur  relatively  quickly  and  at  a  much  lower  cost  than  to  do  the  specialty 


329 


separately  by  Service.  (4)  This  effort  emances  fie  total  Joint -Service  j?m 
effort.  It  allows  the  Services  oppcrt unity  to  share  new  tecnnic-es  while 
e.nricr.ing  their  own  individual  researcn  orograms.  Tremendous  opport -r.it  v 
exists  for  generation  of  research  ideas  and  for  additional  analyses  of 
alternative  performance  measurement  net  nods.  Significant  contributions  will 
be  made  to  performance  neasu. ament  data  oases  and  researcn. 

Summary  and  Future  Research  Possibilities 

As  part  of  m.-  Joint -Service  J?M  progecr,  me  Air  Force  and  law  are 
transferring  performance  measurement  re umoiogy  developed  oy  the  Air  Force 
for  get  engine  mecnanics  to  the  Navy  and  Marine  Corps.  The  feasibility 
smdy  and  test  construction  phases  have  seen  completed.  Data  collection 
will  ce  conducted  d-ring  the  current  year.  The  effort  is  me  first  of  its 
kind  and  will  enhance  future  mt er-Ser vice  transfer  of  performance  technol¬ 
ogy.  It  ..-ill  provide  additional  insignt  into  me  comparison  of  performance 
measurement  mefnods.  One  very  viable  researcn  option,  a  follow-on  to  this 
effort,  would  be  for  me  Services  to  select  a  common  specialty  (e.g.,  secur¬ 
ity  police  or  personnel)  and  develop  multiple  surrogate  measurement  techni¬ 
ques  for  the  same  sample  of  mcunoents  . n  one  Service.  To  avoid  overloading 
the  test  lncunoent  with  performance  tests,  a  sample  of  the  various  techni¬ 
ques  could  be  used  for  comparison  purposes.  This  study  would  allow  for 
direct  comparison  of  all  the  Service  measurement  techniques  and  provide  a 
tremendous  data  case  from  which  to  explore  additional  research  in  perform¬ 
ance  measurement.  Such  research  mgnt  incl-de  a  cost -effect  ive/ut  1 1  it  y 
analysis  of  the  various  surrogate  measurement  techniques  to  deternine  which 
tecr.nicue  gives  tr.e  maximum  payoff  for  t^e  least  cost  or  how  surrogates  can 
be  combined  to  provide  me  greatest  amount  of  performance  information. 

In  summary,  benefits  derived  from  tms  study  will  nave  a  significant  impact 
on  performance  measurement  researcn  for  me  indi.idual  Services,  for  tne 
Joint-Service  progect,  and  for  the  field  of  industrial  psychology. 
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SIMULATION  OF  INSTRUCTOR  AND  CROUP  PROC  £SS 
ROLES  WITH  MICROCOMPLTER  TECHNOLOGY 

3arbara  L.  McComos 
Denver  Research  Institute 


Maoi  st.jce.its  entering  r,.htu->  technical  training  not  only  are  deficient  in  basic 
read  ig  sk. i is,  study  skills,  and  cognitive  strategies,  but  trie >  also  are  deficient  .n 
ot.vat.ona!  skills  (McCombs  5.  Dobrovoin>,  1932).  These  ’motivational  deficiencies  are 
reflected  in  trainees’  i.nabtlit;.  to  posit. vely  adjust  to  technical  training  require  nents  and 
.•i  pie  rent  necessary  self--nanage  nent,  personal  responsibility ,  and  positive  self-control 
st-atfg.es  related  to  se.(--rot.  vation  (McCombs,  !9S~).  Specific  deficiencies  related  to 
i'.sat:  factory  technical  training  performance  mclnae  inadequate  goal  setting  and  problem 
sOiV.ng  sk.iis,  self -evaluation  and  planning  SKiils,  strategies  for  dealing  with  anxiety  anc 
stress,  and  communication  s-uils  (McCombs  A  Dobro  volny ,  1 9S ?)-  A  program  for 
-e..eo;.nfe  these  deficiencies,  ent.tled  the  Motvitional  Skills  Training  Program,  was 
developed  b>  McComos  and  Dobrovolny  (19S2)ar.o  evaluated  with  Air  Force  trainees.  The 
program  incijces  seven  self -instructional,  printed  nodules  that  have  been  implemented  m 
an  i  istructor-led,  small-group  format  which  provides  trainees  with  the  opportunity  to 
p-uctiie  new  strategies  and  skills,  share  experiences,  and  develop  feeiings  of  rapport  with 
them  i  istructors  and  peers.  Evaluation  data  indicated  that  trainees  liked  the  program  and 
faund  ■:  ''e'pful  m  them  course  work  and  persona!  lives.  Trainees  participating  in  the 
program  also  had  significant!)  higher  test  scores  and  lower  test  failure  rates  than  control 
group  tra.nees  (McCombs  A  Dobrovolny,  19S2).  Although  these  evaluation  findings  with 
the  motiv itional  program  pointed  to  its  success,  several  questions  remained.  One  set  of 
oicjt.ons  concerned  tinning  format  and  whether  program  cost  effectiveness  could  be 
enhanced  by  reducing  in*  true  to-  ar,d/or  group  interaction  requirements  through  the  use  of 
computer-assisted  instruction  (C  A I )  for  selected  portions  of  the  training.  These  questions 
were  addressed  m  the  reported  research,  undertaken  for  the  Army  Research  Institute. 


Background.  In  discussing  the  use  of  computer-based  media  versus  conventional 
ned.a  such  as  instructors  or  group  instruction,  Clark  (19S3,  1934)  makes  the  cogent  point 
that  it  is  not  the  media  per  se  that  influence  learning.  Rather,  it  is  the  content  and 
method  of  instruction  that  are  critical  and  the  medium  js  merely  an  alternative  deliver) 
vehicle.  Clark  argues  that  in  using  the  medium  of  computers,  one  must  focus  on  available 
instructional  theory  in  finding  the  necessary  instructional  methods  for  fostering  the 
desired  learning  outco  ne.  He  also  argues  that  decisions  to  use  computers  are  more  a 
natter  of  implementation  issues  such  as  cost,  practicality,  resources,  and  equity  of  access 
and  that  tins  medium  can  be  maximized  to  address  particular  implementation  problems  by 
focusing  on  the  computer's  special  features  (Clark,  1939).  Therefore,  in  using  CAI  to 
simulate  critical  instructor  or  group  process  functions,  special  delivery  features  must  be 
carefully  matched  to  content  and  function  requirements. 


Some  features  of  CAI  that  are  potentially  useful  to  students  who  have  responded 
poorly  to  traditional  methods  include  individualisation,  mastery  learning,  and  self-pacing. 
Recent  research  with  these  features,  particularly  the  mastery  learning  model,  however, 
has  raised  some  question  abo.it  their  effectiveness  for  the  disadvantaged  background,  low 
ability  student  (Covington  A  Omelich,  1931;  Federico,  1931;  Stinard  A  Dolphin,  1931; 
Thompson,  1930).  Covington  ana  Omelich  (!9S1)  question  whether  instructional  features 
which  include  repeated  test  trials  and  grading  against  absolute  standards  perpetuate  a 
negative  failure  cycle  foi  low  ability, 'low  self-confidence  students.  On  the  practical  side, 
however,  Siegel  and  Simutis  (1979)  point  out  that  within  the  Army  there  are  many 
problems  associated  with  providing  basic  skills  training  to  large  numbers  of  individuals  at 
many  different  locations  (e.g..  inconsistent  t  ontent  quality,  inconvenient  training  tunes, 
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propr.ate  matches  of  skill  levels  and  basic  skills  curriculum). 
\RI  to  explore  the  jse  of  CM  for  bus.*;  skills  training  m  the 
v-di.it"  CM  for  various  :>pes  of  sMlis  training,  Siegei  uno  S 
■vis  it  least  ud  -Mfective  as  tra'J.t.onal  ;  lstruc  lion,  purtic 
■'  iri. '.:ic  n  the  roles  required  by  the  new  techno!og>. 


Problems  such  as  these 
Arm;. .  In  initii!  studies 
mutis  (1979)  report  tn.at 
alar!/  if  instructors  are 


An  .ssue  of  *:c..cern  in  the  'uccessfui  .n  piementation  of  CAI  with  a  sk.lls  training 
urr,.U',:r,l  then,  is  the  role  instructors  play  m  tne  learning  process.  At  a  mimmum,  it 
•.as  n- on  suggested  that  .istri^tors  ha.e  .np^t  into  how  CAi  is  used,  be  given  short 
.'se-v.;e  workshops  wherein  CAI  applications  are  expiamea  and  demonstrated,  ana 
re  .*  ve  nean.ngfjl  role  "ainmg  (e.g.,  Bloom,  1984;  McCornbs  A  Dobrovolrn,  1980,  19S?; 
McCo'  ;bs,  Doorovolny,  A  Lockhart,  I9S3;  Staso,  \X  inkier,  Sha/elson,  Robyn,  A  Feibel, 
198k;  Sw,.ig  A  Peterson,  1981).  In  addition.  Jernstedt  (1983)  has  argued  that 

.nc. /. cualicec  computer  technologies  not  be  usea  as  replacements  for  teachers  and  for 
group  learning,  a. id  stresses  the  need  for  successful  combinations  of  interpersonal 
reiat.uis  and  xonputer  technologies.  Interpersonal  or  hunaii  functions  seen  as  important 
include  a  focus  on  peer  relationships  and  cooperat.ve  goals,  and  defining  leadership  roles 
for  students  *.nd  instructors  such  that  high  task  engagement  results.  Computer  functions 
. tei  as  important  include  frequent  and  varied  active  student  interaction  and  the  use  of 
visual  and  other  sensors  feedback  to  maintain  student  attention. 


L -bin  ( 1 9 S A )  argues  that  for  computer-based  instruction  to  be  effective,  a  learning 
en/.-oument  has  to  be  created  that  mirrors  the  teaching/learning  characteristics  of  live 
instruction.  Similarly,  Podenski  (1984)  argues  that  this  technology  should  simulate  ideal 
-.t  jdenr-teacher  interactions  and  free  teachers  for  more  complex  tasks,  such  as  diagnosing 
learn. ag  problems,  helping  students  develop  appropriate  learning  strategies,  and 
non.tormg  instr uCtiOfial  effects.  Recent  ad/ancements  now  make  it  poss.ble  to  include  a 
rich  array  of  audio  and  visual  capabilities  within  interactive  CAI  lessons.  Ginther  (1983) 
discusses  advances  in  the  area  of  audio/speech  devices  that  can  be  connected  to  a  variety 
of  common  microcomputers.  Benefits  of  these  devices  include  the  reduction  of  reading 
require  nents,  the  provision  of  rnul tisensory  exploration  of  new  information,  and  the 
person  Citation  of  materials.  Implications  for  how  CAI  might  be  used  to  simulate  group 
.nteru  tions  can  be  drawn  from  the  work  of  Bloom  (1984),  Bouton  and  Garth  (1983), 
rubber!',  Omuo,  A  Longano  (1984),  Michaelson  (1983),  and  Neale  (1983).  These  include 
the  use  o,  nu’tiple  context  case  histories  of  meaningful  peer  problems  in  which  students 
*  an  interact  vely  engage  in  identifying  the  problem,  consequences,  and  alternative 
■mictions  through  computer-guided  inquiry,  imagery,  and  explanations. 

In  summary,  this  selective  review  has  identified  features  of  CAI  that  can  be  used  to 
jiriulato  instructor  and  group  process  functions.  In  particular,  careful  selection  of 
training  content,  caref’f  design  of  CAI  strategies,  the  incorporation  of  personalization 
through  an  integration  of  audio  and  visual  capabilities,  the  identification  of  meaningful 
roles  f  Ji  i  lstructors,  and  use  of  the  inherently  motivating  qualities  of  this  medium  should 
<  on  tribute  to  the  effectiveness  of  the  CAI  enhancements. 


Method 


Design  and  development  of  CAl/audio  segments.  CAI  introductory  and  practice 
segments  were  designed  and  developed  for  each  of  the  seven  motivational  skills  modules. 
\  simple  computer-controlled  audio  capability  was  developed  to  achieve  the 
personalization  desired  in  the  simulation  of  instructor  and  group  functions.  The  character 
"PC,"  created  to  simulate  instructor  functions,  was  designed  to  enact  three  primary  roles: 
facilitator,  modeler,  and  motivator.  In  the  facilitator  role,  PC  helped  students  acquire 
new  concepts,  skills,  and  strategies  via  introductory  explanations  and  practice  exercises, 
hi  the  modeler  role,  PC  demonstrated  the  application  of  new  concepts,  skills,  and 
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aidiO  tapes  had  been  -ecordec,  pulses  were  added  at  points  that  confided  with 
C-M  streen  changes.  The  co.ntractoi-  de /eloped  audio  interface  consists  of  a  specially 
ces.gned  .nte-fuce  Curd  wluch  plugs  .nto  the  Apple  lie  ga  ne  I/O  port.  The  interface 
reoei/es  the  pulvs  from  a  standard  slide- sync  audio  cassete  player.  These  pulses  trigger 
stree"  changes  and,  in  turn,  al.ow  the  CM  software  (in  this  case,  the  Apple  SuperPiLOT 
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Auf’or.ng  Sister )  to  control  tne  on,,  ofi  function  of  tne  audio 
allows  for  co.  ,p uter  control  of  a  linear  sequence  of  audio  messages  that  coincide  with 
particular  CM  ha  ne  sequences,  as  well  as  pro/;des  for  the  personalizat.cn  of  skill 
trailing  introductions  and  practices,  at  about  one-eighth  the  cost  of  /ideodisc  technology. 

Experimental  devgo.  experimental  conditions  were  defined:  an  historical 

control  group  (HC),  current  control  group  (CC),  a  CA1  introduction  and  practice  group 
(CAi),  a  CAi  introduction  and  instructor  ptactn.e  gi^-p  (CAH),  an  mstructoi  introduction 
and  CM  practice  group  (ICM),  and  an  instructor  introduction  and  practice  group  (II). 
Other  independent  /ariaoles  included  student  scores  on  the  General,  Electrical,  . 
Clerical  subscales  of  the  Armed  Ser/ices  Vocational  Aptitude  Battery  (ASVAB);  military 
rank;  sex;  initial  judgments  of  self-efficacy;  and  initial  indices  of  anxiety  and  ability  to 
cope  with  stress.  Dependent  variables  included  tune  to  complete  the  first  and  second 
<  ourse  segments;  test  failure  rates  m  the  first  and  second  course  segments;  progress  index 
for  the  entire  course;  and  whether  students  uttrited  or  graduated.  (See  McCombs  et  al,  in 
press,  for  a  description  of  measures  used.) 

Subjects.  Participant,  m  the  study  were  male  and  female  students  in  the  Electronic 
Communications  (EC)  school  at  Ft.  Sill.  Students  were  assigned  to  one  of  the  fi/e  current 
(">  perirnentul  conditions  by  designated  Ft.  Sill  personnel,  using  guidelines  and  procedures 
specified  by  the  contractor.  At  the  conclusion  of  the  study,  data  on  a  total  of  479 
students  were  available  for  analysis.  The  number  of  students  in  each  condition  were  as 
follows:  43  in  the  HC  condition,  243  in  the  CC  condition,  54  in  the  CAI  conditon,  FI  in 
the  CMI  condition,  4  3  m  the  ICAI  condition,  and  47  in  the  II  condition. 

Procedures.  A  lF-hour  training  program  for  three  Ft.  Sill  EC  course  instructors 
acquainted  them  with  the  purpose  of  the  evaluation  and  the  training  program,  provided 
guidelines  for  introducing  each  module  and  conducting  the  small  group  pra>  tice  sessions, 
and  described  procedures  foi  implementing  each  experimental  condition.  Instructors  were 
also  trained  m  the  content  of  the  CM  materials  and  operation  of  the  CAI  equipment.  A 
workshop  format  w  is  used  to  ensure  comparability  between  the  CAI  arid  instructor 
conditions,  mstru>  tors  were  provided  with  scripts  of  the  case  studies  applying  to  each  CAI 
(  iuir  icter  w!,i<  h  they  could  use  ns  part  of  their  introductions  and  prat  tier  sessions  for 
-Mih  module.  Ti le  pretest  measures  acre  administered  to  students  on  the  day  they  began 
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3ernst<  ut's  (19S3)  point  that  CM  technologies  cannot  he  used  as  replacements  for 
teachers  and  group  learning,  and  that  there  is  a  need  for  a  synergetic  combination  of  the 
human  and  computer  func'ions  to  achieve  maximum  instructional  ef  fecti  veness. 


The  exploratory  individual  difference  analyses  have  suggested,  however,  that  the 
C-M  enhanced  version  of  ttie  motivational  training  was  at  least  equally  effective  foi  some 
types  of  students.  Of  the  individual  difference  variables  available  for  inclusion  ir  these 
exploratory  analyses,  the  findings  with  the  general  ability  measure  (ASVAB  General)  are 
not  sui  prising.  That  is,  a  number  of  studies  have  found  C-M  or  other  multimedia 
treatments  to  be  as  effective  as  traditional  instructor/group  methods  for  high  ab.litv 
students  (o.g.,  Clark,  19$+;  Kuhk,  Rangert,  A  Williams,  19S3).  On  the  other  hand,  the 
fmdi.,Cjs  that  students  low  in  perceptions  of  co  npetence  subsequently  perform  better  if 
they  uce./ed  the  CM  enhanced  version  vs,  the  instructor/group  version  are  somewhat 
puzzling.  \  plausible  explanation,  however,  may  be  derived  from  Bandura's  ( 1 9 S 2 )  theory 
of  self-efficacy  winch  suggests  that  individuals  low  in  perceived  competence  do  not  judge 
themselves  as  capable  of  handling  particular  situations,  including  interpersonal  situations. 
Because  ol  their  low  feelings  of  personal  adequacy,  they  are  often  threatened  in 
interpersonal  situations  and  fearful  of  having  their  perceived  inadequacies  exposed.  For 
these  individuals,  then,  it  is  reasonable  that  the  nonhuman  medium  of  CA1  may  provide  a 
less  threatening  learning  environment,  particularly  for  this  type  of  self-development 
training  that  requires  considerable  self-analysis  and  self-exposure.  As  Bowman  (19S2)  has 
argued,  C-M  advantages  include  (a)  freedom  from  fear  of  reprisal,  ridicule,  or  rejection; 
and  (b)  provision  for  active  involvement  in  tasks  that  ate  based  on  a  high  probability  of 
su:  cess.  Thus,  it  may  be  that  the  medium  of  CA1  IS  a  more  optima!  tr'-mtmpnf  fr 
students  whose  initial  perceptions  of  self-efficacy  are  low. 
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The  preceding  speculations  need  to  be  verified  by  further  research.  As  noted  by 
Siegel  and  Simutis  (1979),  CATs  potential  lies  in  its  ability  to  provide  i ndi  vir'uuhzed, 
standardized,  and  efficient  instruction,  particularly  to  adult  learners  who  require  remedial 
ttaimng.  Individualisation  issues  wth  microcomputer  technology  thus  need  to  be  system¬ 
atically  explored,  such  that  differential  assignment  to  this  medium  can  improve  training 
performance  and  lead  to  more  cost-effective  use  of  human  personnel.  There  is  little 
question  based  on  the  research  reported  here  that  instructors  and  the  group  process  play  a 
critical  role  in  the  success  of  motivation.il  training  and  the  success  of  individualized 
<  omputer-based  approaches  in  general  (McCombs,  in  press).  Continued  research  with 
mn. rot  omputer/audio  technologies  that  can  simulate  instructor  and  group  requirements 
promises  to  contribute  to  a  realization  of  the  full  benefits  of  tins  technology. 
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ASSESSING  Ml  TANK  COMMANDERS  WITH  A  COMPUTERIZED  HAND-HELD  TUTOR 


Brent  Bridgei.an 
Educational  Testing  Services 
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Theodore  Post 
Essex  Corporation 

Tank  commanders  on  the  Ml  Abrams  tank  must  be  trained  to  very  quickly 
evaluate  a  battlefield  situation,  identify  the  target  which  should  be  engaged 
first,  choose  the  appropriate  weapon  and/or  ammunition  (from  among  three 
machine  guns  and  two  types  of  main  gun  ammunition),  issue  the  appropriate  fire 
command  to  the  loader  and  gunner,  direct  the  driver  to  move  or  change 
direction  if  necessary,  and  maintain  communication  with  other  tanks  in  the 
unit.  For  such  complex  tasks,  realistic  hands-on  training  is  clearly 
essential  to  reach  a  level  of  effectiveness,  or  even  survival,  on  the  modern 
battlefield.  But  hands-on  training  is  extremely  expensive  because  of  high 
costs  of  equipment  and  ammunition  as  well  as  the  costs  of  transportation  to 
the  few  areas  that  can  support  full-scale  field  exercises.  In  order  to  make 
better  use  of  the  very  limited  time  for  hands-on  training,  the  commander 
trainees  should  have  already  mastered  the  basic  prerequisite  skills  before 
going  into  the  field.  Thus,  for  example,  if  the  trainee  in  the  class 
practices  basic  elements  of  a  fire  command  until  they  are  virtually  automatic, 
he  is  more  likely  to  be  able  to  use  them  correctly  under  the  multiple 
pressures  in  the  field  environment. 

The  U.S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 
( ARI )  has  been  developing  a  number  of  approaches  to  making  traditioinal 
training  more  effective.  Several  projects  employ  microcomputers  combined  with 
videodisks.  Such  systems  are  reasonably  realistic  and  are  relatively 
inexpensive  compared  with  hands-on  training,  but  they  are  still  costly  enough 
that  the  amount  of  time  a  given  soldier  can  spend  working  with  the  system  is 
limited.  In  addition,  such  systems  are  not  easily  portable  so  that  soldiers 
must  come  to  a  fixed  location  at  a  fixed  time  to  work  with  the  system. 
Therefore,  as  a  supplement  to  microcomputer/videodisk  technologies,  ARI 
sponsored  the  development  of  a  low-cost,  hand-held  computerised  Tutor.  The 
intention  was  to  make  a  device  that  was  low  enough  in  cost  (under  $150)  and 
small  enough  in  size  (no  larger  than  a  notebook)  that  it  could  be  used  by 
soldiers  in  much  the  same  way  that  they  could  use  a  textbook  or  manual.  The 
Tutor  that  was  developed  as  a  result  of  this  initiative  is  a  10"  x  11"  x  2" 
device  with  a  32  character  dot  matrix  display  screen  on  the  top,  a  keyboard  on 
the  bottom  (numbers  0-9,  letters  A-E  and  three  operational  keys:  SAY,  ERASE 
arid  GO)  and  an  indentation  in  the  center  that  holds  an  open  5"  x  5"  booklet. 
The  use  of  a  printed  booklet  for  the  display  of  test  questions,  instructional 
text,  and  graphics  permitted  substantial  cost  savings  compared  to  systems  that 
store  this  type  of  textual  and  graphical  information  in  the  computer's  memory 
and  display  it  on  a  CRT.  The  Tutor  also  contains  a  digitized  speech  system. 
(For  a  more  complete  description  of  the  Tutor,  see  Fertner  and  Bridgeman, 
1985). 
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The  Tutor  was  originally  intended  to  teach  technical  vocabulary.  Its 
three  independent  (but  mutually  supporting)  courseware  components  are  —  (a) 
an  instructional  sequence  including  a  pretest  and  explanatory  text  with 
embedded  questions,  (b)  a  drill  and  practice  session  'called  Word  War)  in 
which  items  answered  incorrectly  initially  are  presented  again  after  just  one 
other  item  has  been  presented,  again  after  three  more  items  have  been 
presented,  etc.,  (c)  a  game  (called  Picture  Battle)  requiring  recognition  of 
an  appropriate  picture  (or  portion  of  a  picture)  given  a  spoken  stimulus. 
Because  of  the  success  of  this  approach  for  vocabulary  instruction  (Bridgeman 
and  Wisher,  1985),  ARI  focused  attention  on  other  areas  where  this  technology 
could  be  applied.  One  of  these  areas  was  instruction  in  issuing  fire  commands 
for  Ml  tank  commanders.  A  contract  was  awarded  to  Educational  Testing  Service 
(and  its  subcontractors  Advanced  Technology  Laboratories  and  BioTechnology 
Inc. )  to  adapt  a  set  of  existing  instructional  booklets  for  presentation  on 
the  Tutor.  The  remainder  of  this  paper  describes  how  the  Tutor's  three 
operational  modes  (pretest  and  explanation,  Word  War,  and  Picture  Battle)  were 
used  for  fire  commands  instruction. 

Pretest  and  Explanations 

Each  of  the  28  instructional  units  begins  with  a  brief  multiple-choice 
pretest.  The  questions  and  answer  choices  are  printed  in  the  book,  and  the 
soldier  responds  by  pushing  one  of  the  A-E  keys  on  the  keyboard.  After  the 
pretest,  the  Tutor  instantly  evaluates  the  soldiers  performance  ,  and  allows 
soldiers  with  no  errors  to  proceed  immediately  to  the  next  unit.  For  soldiers 
who  made  errors,  the  Tutor  displays  the  answer  choice  they  selected  followed 
by  the  correct  response.  Thus,  this  test  review  becomes  the  first  step  in  the 
instructional  process. 

Next,  soldiers  who  made  errors  on  the  pretest  are  directed  to  begin 
reading  the  explanatory  text.  Frequent  questions  are  sprinkled  throughout  the 
text  to  ensure  that  attention  and  comprehension  are  maintained.  The  Tutor 
pr<-"ncles  immediate  corrective  feedback  on  these  items,  but  errors  should  be 
rare  if  the  soldier  is  reading  carefully. 

In  the  vocabulary  module,  the  "SAY"  key  was  used  to  make  the  Tutor 
pronounce  target  words  that  were  under? ; ned  in  the  text  with  code  numbers 
under  them.  The  soldier  pushed  "SAY"  and  then  the  number  under  the  word  that 
he  wanted  to  hear.  In  the  fire  commands  instruction  the  use  of  the  "SAY"  key 
is  quite  different;  it  is  used  to  provide  a  brief  explanation  of  incorrect 
answers.  Instead  of  entering  responses  on  the  A-E  keys,  the  soldier  makes  a 
selection  Viy  pushing  "SAY"  then  1,  2,  3,  or  4.  This  activates  the  Tutor's 
digitized  voice  system.  For  <.xample,  a  page  in  the  book  shows  a  picture  of  a 
battlefield  scene  witn  several  potential  targets  labeled  1  to  4,  and  the  text 
asks  the  student  to  identify  the  most  dangerous  threat.  For  one  incorrect 
answer.  The  Tutor  says  "No,  out  of  range,  try  again"  and  for  another 
incorrect  answer  it  says  "No,  can't  kill  ycu,  try  again."  Through  this  type 
of  oral  feedback  the  soldier  learns  not  only  which  answer  &  are  incorrect,  but 
why  they  are  incorrect.  Although  this  information  could  be  presented  through 


the  display  screen  instead  of  the  voice  system,  it  would  be  considerably  more 
distracting,  as  the  soldier  would  have  to  repeatedly  shift  attention  from  the 
picture  of  the  battlefield  scene  to  the  display  screen.  Thus,  while  the  use 
cf  voice  technology  in  this  application  was  not  as  crucial  as  it  was  for  word 
pronunciation  in  vocabulary  instruction,  it  still  adds  a  different,  valuable 
dimension  to  the  instructional  process. 

Word  War 

The  logic  behind  the  increasing  ratio  review  used  on  Word  War  (Siegel  and 
DiBello,  1980)  applies  to  many  different  rote  memorization  tasks,  not  just 
vocabulary  learning.  In  the  fire  commands  module  it  is  used  to  provide 
practice  m  weapon/ammunition  selection,  and  in  the  identification  of  the  name 
of  threat  weapons.  For  the  weapon/ammunition  selection  routine,  the  screen 
first  presents  a  situation  (e.g.,  T-72  at  1000  meters).  Followed  by  three 
answer  choices,  presented  one  at  a  time  (e.g.,  SABOT,  HEAT,  M-240).  The 
soldier  is  instructed  to  push  "GO"  when  the  correct  answer  appears  on  the 
screen.  The  Word  Wars  for  threat  weapon  identification  were  added  when  data 
from  a  preliminary  field  trial  indicated  that  some  soldiers  had  difficulty 
reading  the  text  because  they  did  not  know,  for  example,  that  an  SPG-9  is  a 
recoilless  anti-tank  gun. 

Picture  Battle 

In  this  game-like  activity  the  Tutor's  voice  system  asks  a  question  based 
on  a  picture  in  the  book,  and  the  soldier  responds  on  the  keyboard.  The 
display  screen  is  used  to  keep  score.  With  each  correct  answer  a  "projectile" 
formed  by  the  dots  on  the  display  screen  moves  one  step  from  left  to  right 
across  the  screen.  For  each  incorrect  answer  an  "enemy"  projectile  moves 
across  the  screen  in  the  opposite  direction.  The  object  of  the  game  is  to 
destroy  the  enemy  target  before  the  enemy  destroys  you.  Hitting  the  enemy 
target  is  accompanied  by  sound  effects  of  a  shell  exploding.  For  vocabulary 
instruction,  Picture  Battle  was  used  to  reinforce  an  association  between  the 
spoken  name  of  an  object  and  a  pictorial  representation  of  the  object  (e.g. 
the  Tutor  would  say  "equilibrator"  and  the  solider  would  find  the  picture  of 
an  equilibrator),  eliminating  entirely  the  need  to  read  anything.  Although 
fire  commands  instruction  did  not  require  a  task  with  no  reading  requirement, 
the  game-like  score  keeping  features  of  Ficture  Battle  could  still  be  used  to 
good  advantage.  In  one  implementation,  a  battlefield  scene  is  pictured  and  a 
situation  described  in  the  text;  when  the  Tutor's  voice  asks  "Initial  Fire 
Command"  the  soldier  must  select  the  appropriate  fire  command  for  the  pictured 
situation.  Immediate  corrective  feedback  is  provided  and  the  appropriate 
projectile  advances  across  the  screen,  then  the  soldier  is  asked  to  turn  to 
the  next  battlefield  scene  and  the  process  is  repeated. 
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Conclusions 

As  an  aid  to  assessment  and  training  of  tank  commanders,  the  Tutor  falls 
on  a  continuum  between  traditional  paper-based  materials  and 
microcomputer  videodisk  system,* .  The  Tutor  lacks  the  flexibility  and  moving 
graphics  capabilities  of  interactive  videodisks,  but  it  is  a  fraction  of  the 
cost  of  such  systems  and  is  easily  portable.  Aitnough  the  Tutor  is  more 
costly  than  purely  paper-based  materials,  it  provides  a  much  mere  interesting 
and  interactive  environment  for  assessment  and  instruction:  soldiers  may  be 
’ mmediately  branched  co  more  difficult  material  based  on  pretest  scores, 
visual  and  auditory  explanatory  feedback  is  provided,  drill  and  practice 
exercises  in  which  the  item  presentation  order  is  varied  depending  on  student 
performance  are  available,  and  interactions  with  game-like  visual  and  auditory 
scoring  features  are  included.  Every  training  technology  has  advantages  and 
disadvantages,  and  the  hand-held  computerized  Tutor  appears  to  have  a  place  in 
the  instructional  arsenal. 
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BACKGROUND 

1.  A  major  percentage  of  the  technical  training  of 
officers  and  ratings  conducted  by  Naval  Establishments  has 
traditionally  used  hard  copy  Books  of  Reference  (BRs)  and 
other  Technical  Publications,  supplemented  by  a  series  of 
professionally  made  training  aids.  However,  the  increasing 
complexity  of  HM  ships  has  led  to  a  large  increase  in  the 
documentation  required  to  operate,  maintain  and  repair  ships 
equipments  and  systems  such  that  a  modern  frigate  carries 
the  equivalent  of  some  quarter  million  A4  pages  weighing 
some  1^  tons.  Microform,  having  already  demonstrated 
considerable  advantages  in  many  areas,  has  been  chosen  as 
the  most  suitable  media  for  more  general  naval  use,  and  in 
particular  for  technical  BRs  and  publications. 

Consequently,  for  training,  the  withdrawal  of  hard  copy  BRs 
and  the  introduction  of  microform  requires  the  conversion  of 
microform  to  readable  print  with  the  aid  of  readers, 
projectors  and  printers. 

2.  The  Royal  Naval  School  of  Educational  and  Training 
Technology  ( RNSETT ) ,  and  the  Royal  Naval  Submarine  School 
( RNSMS ) ,  were  tasked  to  identify  one  or  more  microform 
projectors  that  could  operate  under  the  same  conditions  and 
to  the  same  standards  as  conventional  overhead  projectors, 
and  tnereby  enable  the  establishment  of  a  teaching  strategy 
for  microform  in  the  classroom.  (1) 


SCOPE  OF  THE  INVESTIGATION 

3.  This  investigation  considered: 

a.  The  type  of  microform  projectors  available 
(forward  or  rear  projection). 

b.  The  applicability  of  the  microform  projector 
to  the  classroom  environment. 

(1)  Ir. tp ..  lion  into  .lic-ofcm  projector:' ,  /.arch  1  Jtt , 

Lt  U;  P  "  To vey  £  "‘1.,  am.  Zuj  Z  II  More , r.or  Z  ZC. ,  RAZO. 
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c.  The  most  appropriate  teaching  strategy  for  use 
with  microform. 

d.  The  range  of  subject  matter  to  be  taught  using 
microform. 


CLASSROOM  TRIALS 

4.  Trials  were  conducted,  in  the  classroom  environment, 
on  a  range  of  commercially  available  microform  projectors. 
Additionally  some  desk-top  readers  were  also  trialled. 

5.  Each  of  the  projectors  was  evaluated  under  similar 
conditions  using  an  instructor  to  give  a  lesson  which  was 
observed  by: 

a.  A  subject  matter  expert. 

b.  An  Instructional  Techniques  officer. 

c.  A  Training  Design  officer. 

d.  A  Quality  Control  officer. 

e.  An  officer  from  the  RNSETT  Instructiona : 
Techniques  (IT)  Group. 


MICROFORM  PROJECTION 

6.  Forward  Projection.  Each  of  cue  microform 
projectors  was  evaluated  to  assess  its  capability  to  project 
an  image  onto  a  screen  (as  a  conventional  OHP).  In  all 
cases  the  image  projected  was  of  poor  quality  and  could  only 
be  read  by  students  close  to  the  screen.  The  edges  of  the 
image  were  blurred,  definition  was  poor  and  key-stoning 
presented  a  problem.  After  a  few  minutes  there  was 
evidence  of  eye  strain  and  the  instructor  found  it 
increasingly  difficult  to  maintain  student  concentration. 
This  was  aggravated  by  the  face  that  the  projectors  could 
only  operate  in  a  darkened  room.  Used  in  this  mode  the 
image  projected  was  unacceptable  to  both  students  and 
instructor . 

7.  Rear  Projection.  At  the  time  ^f  the  trial  only  one 
projector  had  a  '-ear  projection  capability.  When  used  with 
its  A2-size  screen  it  could  operate  under  normal  classroom 
conditions  with  up  to  6  students,  comfortably,  around  the 
screen  at  any  one  time.  In  this  mode  the  image  was  clear 
and  eye  strain  no  longer  appeared  to  be  a  problem.  However, 
the  instructor  could  not  teach  using  the  strategy  of  the 
lesson  ani  a  tutorial  ^tyle  had  to  be  adopted. 
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3.  Compatible  Desk-Top  Readers.  It  was  found  that  if  a 

rear  projection  microform  projector  was  to  be  used  in  the 
classroom,  students  needed  desk-top  readers  for  back-up  and 
consolidation.  These  readers  were  operated  under  the  same 
conditions  as  the  projector. 


APPLICABILITY  TO  THE  CLASSROOM 

9.  The  investigation  confirmed  that  microform  projectors 
cannot  replace  the  conventional  OHP  and  should  not  be 
considered  as  such.  However,  in  the  rear  projection  mode 
they  are  a  most  useful  training  aid  and,  in  this  mode, 
rather  than  replace  the  OHP  chey  should  be  used  to 
complement  it. 

10 .  The  microform  projector  should  be  just  another 
training  aid  available  to  the  instructor.  To  be  used 
successfully,  however,  class  numbers  would  have  to  be 
small . 


TEACHING  STRATEGY 

11.  As  already  indicated  the  rear  projection  microform 
projector  is  suitable  only  for  small  class  teaching  and  the 
tutorial  style  of  instruction  would  appear  to  be  the  most 
appropriate.  This  may  be  an  entirely  new  way  of  teaching 
for  many  instructors.  Used  in  this  way  the  microform 
projector  has  the  added  advantage  that  the  instructor  can 
focus  the  attention  of  the  students  onto  the  material  being 
presented  and  as  a  result  have  more  control  of  the  learning 
situation  (which  could  never  be  guaranteed  using  hard-copy 
publications ) . 

12.  Consequently  it  is  clear  that  teaching  from  BRs  will 
no  longer  be  possible,  it  will  be  teaching  with  BRs.  Used 
correctly,  therefore,  microform  may  well  improve  the  quality 
of  instruction.  Instructors  will  require  training  in  the 
use  of  microform  in  the  classroom  and  the  tutorial  method. 
Properly  trained  in  the  correct  techniques  instructors  may 
overcome  some  initial  qualms  about  using  microform. 


SUBJECT  MATTER  RANGE 

13.  Microform  projectors  enable  a  number  of  studen  ,s 
simultaneous  access  to  BR  text  an  1  diagrams.  It  should  be 
•'emembeced,  however,  that  these  a  e  photographed  pages  and 
as  viewgrupns  are  of  very  w-'-  st  type;  as  such  they  are 

unaceeptab'<  •.  Similarly  microform  projectors  should  not  be 
used  t-o  present  material  that  can  already  be  done  so  by 
using  the  OHP  and  viewgraph. 
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14.  The  use  of  microform  3Rs  in  training  will  require  the 
instrictor  to  direct  the  attention  of  students  to  relevant 
pages  of  3R  text  and  diagrams.  Tn  this  situation  the 
instructor  is  teaching  students  how  to  use  the  microform  BRs 
and  as  such  could  be  considered  a  skill.  The  microform 
projector  is  suited  to  this  task  although  the  question  of 
indexing  pages,  cross-referencing  and  the  requirement  to 
view  more  than  one  page  at  a  time  will  require  careful 
consideration.  This  may  be  alleviated  by  the  use  of  two 
microform  projectors.  Other  aspects  of  instruction  should 
be  conducted  in  the  traditional  manner,  emphasising  the  need 
for  microform  projectors  to  be  able  to  operate  under  the 
sama  conditions  as  other  training  aids. 


FUTURE  DEVELOPMENT 

15.  Microform  projectors  are  being  continually  improved. 
Thera  ara  now  available  larger  screen  rear  projection  models 
than  the  one  trialled.  There  has  also  been  an  increase  in 
the  number  and  quality  of  desk-top  readers. 


SUMMARY 

16.  In  sunr.  the  investigation  concluded  that: 

a.  Microform  projectors  cannot  operate  under  the 
same  conditions  or  to  the  same  standards  as  the 
conventional  OHP. 

b.  Of  the  microform  projectors  evaluated,  only 
the  rear  projection  model  could  operate 
satisfactorily  in  the  classroom  environment, 

c.  To  be  used  successfully  microform  projectors 
must  be  employed  as  additional  training  aids  and  must 
be  able  to  operate  under  the  same  conditions  as  OHPs 
and  or  her  aids. 

d.  The  microform  projector  should  only  be  used  by 
the  instructor  to  direct  the  attention  of  students  to 
the  relevant  pages  of  the  BR  text  and  diagrams.  It 
should  not  be  used  to  present  material  that  is 
normally  presented  by  the  OHP  and  viewgreph. 

e.  Used  appropriately,  microform  projectors 
could  imprc '  e  the  quality  of  instruction  currently 
undertaken  using  hard-copy  publications. 
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f.  Microform  should  be  used  to  teach  small  groups 

of  students  and  is  more  suited  to  the  tutorial  mode 
of  instr jction .  For  many  instructors  this  will  be  an 
entirely  new  way  of  teaching  and  they  will  require 
training  in  the  use  of  microform  in  the  classroom. 

The  RNSFTf  has  already  designed  an  instructional 
mod  lie  for  'teaching  with  microform'  as  part  of  their 
Instr actional  Techniques  Course. 


This  article  is  the  copyright  of  the  United  Kingdom 
Ministry  of  Defence  ©  1985. 
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Soiling  Job  Analysis 
Stanley  D.  Stephenson 
School  of  Business 
Southwest  Texas  State  University 
San  Marcos,  Texas  78666 

To  : he  job  analysis  practitioner,  the  value  of  job  analysis 
.3  patently  obvious.  Moreover,  the  value  of  job  analysis  data  is 
rote  and  mere  becoming  recognized  by  others;  e.  tj.  ,  witness  the 
increase  m  articles  about  various  personnel  functions  that  ate 
: ascd  :n  having  an  accurate  job  and/or  position  description,  a 
situation  that  can  only  occur  when  detailed  position  information 
:s  available.  There  has  also  been  a  corresponding  increase  in 
the  number  of  books  published  on  the  topic  of  job  analysis. 

With  this  growing  awareness  of  the  value  of  job  analysis 
data,  one  would  expect  there  to  be  a  general  eagerness  to  engage 
m  :cb  analysis  projects.  However,  such  is  not  necessarily  the 
case,  when  it  comes  down  to  releasing  the  monies  to  conduct  a 
job  analysis  (and  a  job  analysis  can  be  expensive),  many 
managers  fail  to  grasp  the  overall  value  of  job  analysis  and 
elect  tc  spend  their  funds  on  projects  that  are  more  immediate, 
and  perhaps  better  understood. 

The  primary  problem  is  that  these  who  truly  want  the  job 
maiysis  done  (e.  g.,  the  trainers,  the  job  evaluators,  etc)  are 
not  t he  ones  who  control  the  funds;  i.  e.,  the  manager.  On  the 
other  hand,  the  outside  consultants,  who  are  contacted  to  submit 
proposals  to  perform  a  job  analysis,  initially  talk  primarily 
with  those  who  want  the  work  done  and  not  with  the  managers. 
However,  when  it  appears  that  funds  are  not  to  be  forthcoming, 
these  desiring  the  project  will  often  arrange  a  meeting  between 
their  manager  and  the  job  analysis  consultants.  Although  the 
meeting  is  advertised  as  a  meeting  of  three  groups  (the  manager, 
tne  job  analysts,  and  those  wanting  the  project),  the  meeting 
^ften  turns  out  to  be  a  direct  discussion  between  the  consultants 
and  the  manager  whom  they  have  only  recently  met.  Moreover,  the 
manager,  since  he  or  she  has  already  indicated  that  funds  may  not 
'e  available  for  the  project,  will  be  somewhat  defensive  and  will 
:  tee  o;ect°d  a  barrier  to  counter  what  is  anticipated  to  be  a 
-'■long  sales  pitch.  It  soon  becomes  apparent  that  support  from 
* nv  third  party  m  the  meeting  (i.  e.,  those  who  want  the 
...eject)  is  non-existent;  these  individuals  are  hoping  that  some- 
‘hing  (which  will  net  narm  their  careers  within  the  organization) 
will  happen  to  change  their  manager's  mind.  Consequently,  the 
-i  analysts  not  only  have  to  overcome  a  negative  predisposition 
n  ‘he  part  of  the  manager,  they  also  have  to  do  it  alone. 

The  underlying  problem  is  that  managers  usually  have  but  a 
.’.mired  grasp  of  the  value  of  job  analyst  data.  For  instance, 

' hey  may  know  about  its  value  in  designing  a  training  program 
t  ut  ne  uninformed  about  its  usefulness  in  designing  a  performance 
■  "'aluar ion  system.  Such  a  situation  can  be  easily  understood 
..’hit  it  is  remembered  that  usually  one  function  within  personnel 
training  is  doing  the  pushing  behind  the  job  analysis  request. 
Invii  views  are  presented  to  the  manager,  and  these  views  become 
’.he  sole  basis  for  the  manager's  perceptions  of  the  value  of  job 
viulys.s  data.  Consequently,  the  overwhelming  objective  for  any 
n  mam/st  making  a  sales  pitch  is  to  break  these  preconceived 


but  limited  notions  and  to  make  the  manager  aware  of  the  tremen¬ 
dous  wealth  of  information  that  is  available  from  the  collection 
of  job  analysis  data.  Once  a  manager  realizes  the  indispensable 
and  varied  aid  that  job  analysis  data  can  be  to  the  organization, 
the  manager  may  be  more  willing  to  release  funds  for  the  project. 

This  paper  briefly  re”iews  the  variety  of  ways  m  which  job 
analysis  data  can  be  of  use  to  an  organization.  These  uses  will 
be  discussed  with  regard  to  both  military  and  non-military 
organizations.  Next,  a  suggested  approach  for  expanding  an 
organi tat  ion ' s  awareness  of  the  value  of  job  analysis  data  will 
be  demonstrated  using  actual  data. 

Uses  of  Job  Analysis  Data 

Job  analysis  data  are  useful  m  the  broad  areas  of  selection, 
naming,  production,  performance  evaluation,  promotion,  com¬ 
pensation,  termination,  and  termination/retirement.  These  areas 
will  discussed  in  the  following  sections.  Not  initially  obvious 
military  applications  will  be  highlighted. 

Selection 


Job 
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analysis  data  allows  a  detailed  job  description  to  be 
Such  a  description  is  essential  to  the  proper  hiring 
of  new  employees;  if  the  nature  of  the  job  is  not  known,  how  can 
a  good  hiring  decision  be  made?  For  instance,  research  has 
shown  that  realistic  job  previews  (RJP)  are  beneficial  in  terms 
of 
to 


job  analysis  data  lends  itself 


hire/select  personnel  to  come 
military  managers  spend  consid 
of  military  personnel. 


Moreover , 
to  hire 
would  be 
to  grips 


initial  hiring  and  retention; 
the  preparing  of  RJPs. 

Managers  in  the  military  do 
into  their  units.  In  fact  many 
erable  time  reviewing  the  files 

military  managers  go  through  lengthy  interview  sessions 
civilian  replacements.  The  military  manager's  decision 
greatly  enhanced  if  he  or  she  spend  as  much  time  coming 
with  the  nature  of  the  job  being  filled  as  with  the  nature  of  the 
applicant.  However,  such  information  is  frequently  lacking.,  and 
therefore  managers  spend  their  time  on  information  that  is 
■callable;  e.  g. ,  applicants'  personnel  folders. 

I  -  aining 

Training  can  have  several  purposes.  First,  it  can  be  used  to 
prepare  an  individual  for  entry  into  the  work  force.  Second,  it 
can  be  used  to  maintain  proficiency.  Third,  it  can  be  used  to 
prepare  an  individual  for  advancement.  In  all  cases,  knowing  the 
iob  that  is  currently  being  done  and/or  the  job  to  which  an  indi¬ 
vidual  aspires  are  crucial  to  designing  and  implementing  a  good 
training  program. 

Most  military  managers  realize  the  first  two  purposes  listed 
above.  However,  many  managers  do  not  give  proper  emphasis  to 
pi  (-paring  their  personnel  for  advancement,  often  because  the 
"next"  job  is  not  understood.  Related  to  this  discussion  of 
training  is  the  concept  of  backup  capability.  A  good  manager 
insures  that  his  personnel  have  the  capability  to  backup  key  per¬ 
sonnel  during  periods  of  illness,  vacation,  etc.  Job  analysis 
lata  can  provide  a  basis  for  determining  which  other  position  in 
tne  organization  is  the  most  likely  candidate  for  the  backup 
function.  Coming  to  grips  with  the  concepts  of  both  advancement 
ind  backup  also  permits  the  military  manager  to  better  understand 
i'-i  progression,  and  in  the  long  term  the  nanacer  is  better  able 
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:  o  improve 
Production 


the  morale  and  productivity  of  his  workers, 


Job  analysis  data  can  permit  the  manager  to  improve  pro- 
ivity  m  two,  general  ways.  First,  it  can  possibly  isolate 


the  differences  m  task  performance  of  high  performers  versus  low 
per  formers.  Second,  it  can  provide  a  picture  of  how  the  entire 
w'ik  force  "fits"  together;  i.  e.,  identify  the  gaps  or  overlaps 
:n  accomplishing  the  assigned  mission.  A  military  manager  could 
•vike  much  better  work  design  decisions  with  job  analysis  data 
uus  caving  to  rely  on  direct  observations  alone. 

•' 'll if  Pittance  Evaluation,  Promotion,  and  Compensation 

These  three  factors  are  directly  linked.  First,  performance 
must  be  evaluated.  Obviously,  a  proper  evaluation  can  not  be 
accomplished  without  knowing  the  details  of  the  job,  yet  research 
nas  frequently  shown  that  supervisors  do  not  know  the  details  of 
their  subordinates  work.  Second,  promotion  is  a  direct  end 
rtoduct  of  merging  the  performance  in  a  current  job  and  a 


.owiedge  of  the  next  higher  level  job. 

The  subtleties  of  the  military  promotion  system  are  well 
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has  a  feel  for  the  best  next  assignment,  is  in  a  much  better 
it  ion  to  aid  his  or  her  subordinates  in  terms  of  job  progress- 
which  in  the  military  directly  relates  to  promotion. 
Compensation  is  a  direct  function  of  logically  tying  salary 
worth  to  the  organization;  i.  e.,  a  classification  procedure, 
basic  method  for  doing  this  is  to  conduct  a  job  evaluation, 
need  for  30b  analysis  data  as  the  basis  for  any  job 
luation  is  well  documented,  and  most  civilian  managers  are 
re  of  this  fact.  However,  although  informal  job  evaluation 
s  occur,  compensation  issues  do  not  appear  to  be  a  relevant 
ue  for  the  military. 

This  brief  review  demonstrates  that  job  analysis  data  really 
an  organization-wide  management  tool  and  not  just  a  specific 
sonnel  tool.  Other  aspects  not  discussed  are  morale,  affirm- 
”e  action,  comparable  worth,  labor  negotiations,  and  legal 
♦s  m  general;  job  analysis  data  are  also  of  benefit  in 
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age:,  he/ she  needs  to  be  show  1  how  job  analysis  data  can  be 
ful  m  a  larger  context  than  originally  thought.  It  is  the 
g"  pictuie  that  will  convince  a  manager  that  a  job  analysis 
ject  should  oe  funded.  Once  a  project  is  funded,  of  course, 
specific  goals  of  different  personnel/  training  functions 
i  also  be  met. 

The  above  argument  should  not  be  dismissed  as  being  appro- 
um  just  for  the  non-military  audience  The  military  manager 
1  os  many  similarities  with  his  civilian  counterpart.  Both 
ups  have  a  vested  interest  in  morale,  productivity,  training, 
Moreover,  with  a  steady  trend  toward  converting  military  to 
i'ian  positions,  military  managers  are  increasing  becoming 
'ived  with  managing  a  civilian  work  force.  Consequently,  many 
he  points  made  in  this  paper  apply  io  military  as  well  as 
il’.  an  managers. 

Increasing  Job  Analysis  Awareness; 

An  Applied  Example 
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Recently,  a  ;ob  analysis  was  aone  cf  c*er::ai  wouers  om - 
:  lyod  by  a  snail  stare  agency  an  Texas;  srrall  m  the  sense  mat 
•no re  we:e  only  10  clerical  positions  surveyed.  Job  titles  were: 
Pile  Clerk,  Receptionist,  Clerk,  Secretary  III,  Secretary  II, 
Secretary  I,  and  Office  Services  Supervisor.  This  srrall  group 
clerical  positions  has  become  very  useful  in  demonstrating  tc 
s  h^w  :  cb  analysis  can  be  used  m  a  variety  of  ways. 

?: m:  to  meeting  with  the  management  of  a  different  erja.u;- 
a  copy  ct  the  clerical  task  mventcry  is  sent  tc  me  crg- 
••  ..  v  r  ..  wi'h  a  request  that  a  mid-level  clerical  worker  cor- 

;  .ore  it.  Tr.e  completed  inventory  is  processed,  and  the  results 
•me  rcrpared  to  the  "populaticn"  of  10  workers  described  m  the 
;  nveciir.g  paragraph.  Examples  cf  the  utility  of  ]ob  analysis 
tit  a  are  demonstrated  and  discussed  with  management.  An  illus¬ 
tration  cf  hew  this  procedure  actually  works  is  presented  below. 

In  this  case,  the  inventory  was  completed  by  a  30b  incumbent 
u:r-;i:.g  m  a  position  with  the  title  of  Administrative  Secretary 
rn  the  target  organisation.  Task  responses  produced  the 


Table  1 

Task  Cluster  Percent  Time  Spent  Results 


Task  Cluster  (Duty) _ 

Written  Communications 
Verbal  Communications 
:  r^*.:  1  but  ion 
1  inane lal 
T'nejaj  Filing 
"so  cf  office  machines 
Process  general  paperwork 
Typing 

Management  Assistance 
Mi  see x 1 a n c o u s 
'■ffree  Management 
Terms  Management 


Percent  Time  S; 
15.12 


14.40 

17.28 


3.60 
3 . 60 


3.60 


15.12 

1.44 


3  .  60 
11 .  52 


0.72 

3.60 


rpecial  -  Customer  Service 
'pcciai  -  Dictation 
'peciax  -  Notary  Public 
-urer  vis  ion 


0.72 

0.72 


Total 


3  .  60 
0.72 


100.08 


•oese  results  could  assist  management  m  all  of  the  areas 
m-ed  earlier.  However,  I  have  found  that  m  a  selling 
n  managers  like  tc  be  shown  hew  job  analysis  data  can  be 
::u  ^ly  .sec.  Consequently ,  for  the  purpose  of  this  paper, 
"mei:u*e  uses  a'e  highlighted:  backup  capability  and 
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v<<  1  ko : 


r  ■ mmeo . a t e 1 y  inform  a  manager  whether  or 
ns rbj litres  can  be  met  when  ho 'she  is 


..erker's  tasks  performed  with  the  hypo- 
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metical  population  reveals  that  rot  the  most  part  “his  wcrke: 
dittos  tan  be  covered  by  other  workers  should  he  or  see  be 
absent.  In  some  task  cluster  areas,  multiple  coverage  is 
aval. able;  i.  e.,  more  that  one  other  position  is  performing  t: 
identical  tasks.  Such  a  situation  is  convenient  for  it  allows 
man. .get  to  efficiently  spread  the  workload  should  the  worker  :: 
most  ion  be  absent.  The  major  selling  point,  however,  is  that 
-mage:  would  know  exactly  where  to  turn  to  insure  that  an  abs< 
marker's  total  responsibilities  are  covered. 

A  .  . re  detailed  analysis  reveals  that  the  worker  in  quest t: 
rerfctms  some  unique  tasks.  In  this  case,  specific  task  ccver< 
•an  als.  be  aralyced;  analysis  results  are  presented  m  Table  i 


Table  2 

Specific  Administrative  Secretary 
Task  Coverage  by  Other  Positions 


Task _ 

Edit  Administrative  Material 
Write  Draft  Minutes 
Answer  Inquiries 
Diaft  Communications 
Process  Payment  Vouchers 
Acknowledge  Invitations 
Conduct  Tours 
Transcribe  Dictation 
Perform  Notary  Public  Duties 


Covered  by: _ 

Sec  II,  Supervisor 
Supervisor 
Supervisor 
Clerk,  Sec  III 
Clerk 

Sec  III,  Receptionist 
NOT  COVERED 
Receptionist 
Sec  III 


Obviously  no  one  other  worker  can  cover  these  unique  and/or 
.tical  tasks.  Simply  put,  a  manager  would  have  to  delegate 
?se  tasks  to  different  workers  in  order  to  have  the  work  acccm- 
.shcd.  Interestingly,  for  some  tasks  the  best  coverage  is 
'vided  by  a  Clerk  while  for  others  coverage  is  provided  by  an 
ice  Seivjces  Supervisor.  M  m  importantly,  there  is  no  one  in 
,  s  population  who  presently  conducts  tours.  This  one  piece  of 
iormation  should  forewarn  a  manager  about  a  possible  lack  of 
/ci age  should  a  key  worker  be  absent.  Such  information  would 
perceived  as  being  very  useful  by  any  manager. 

1  inmq  for  Advancement 

Good  managers  always  look  out  for  their  employees,  even  to 
point  of  insuring  that  they  are  prepared  for  advancement. 

)  analysis  data  permits  managers  to  immediately  determine  the 
.11 s  needed  to  perform  at  a  higher  job  level.  For  instance,  in 

•  example  being  used,  the  position  m  question  (Administrative 
•lot ary)  would  be  equivalent  to  a  Secretary  II  or  Secretary  III 

me  eompaiison  population.  This  compaiison  was  based  on  a 
■' 1 1  a :  :  t y  of  tasks  performed  analysis.  The  next  higher  30b  m 
p  pilatmn  would  be  Office  Services  Supervisor.  Table  3 
r.-mts  a  summary  of  the  major  time  spent  differences  between 

•  present  position  and  the  next  higher  position. 
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Table  3 

Percent  Time  Spent  Differences 
Office  Service?  Supervisor  Versus  Administrative  Secretary 


Task  Cluster  (Duty) 

Percent 

Time  Spent 

Administrative  Office  Service: 

Secretary 

Supervisor 

.vTicten  Communications 

15.12 

1.01 

Verbal  Communications 

14.40 

4 . 28 

Disc : lbution 

17.28 

3.52 

Education 'Training 

0.00 

4.03 

Financial 

3.60 

0.75 

General  Filing 

3.60 

0 .  ?0 

Use  of  office  machines 

3.60 

1.51 

Process  general  paperwork 

0.72 

1.01 

Typing 

15.12 

6.55 

*  Management  Assistance 

1.44 

7.78 

Miscellaneous 

3.60 

2.51 

x  Office  Management 

11.52 

18.86 

*  Plan,- Organize 

0.00 

8.55 

Forms  Management 

0.72 

8.79 

Social 

3.60 

4.47 

Special  -  Customer  Service 

0.72 

0.75 

Special  -  Dictation 

0.72 

0.00 

Special  -  Notary  Public 

3. 60 

0.00 

*  Supervision 

0.70 

20.87 

Other : 

Review/prepare  documents 

1.76 

Assist  in  posting  payroll 

1.26 

Retrieve  stored  information 

1.51 

Total 

100.08 

99.56 

*  Skills  not  used  in  the  present 

position  but 

used  in  the 

higher  level  position. 


These  results  highlight  the  skills  that  the  administrative 
secretary  should  be  acquiring  while  still  in  his/her  present 
t-osition.  Then  when  an  opening  occurs,  the  individual  would  have 
t ne  necessary  training  record  to  compete  for  the  new  position. 
However,  unless  someone  is  aware  of  the  differences  between  the 
two  positions,  a  proper  on-the-job  training  program  can  not  be 
created.  Job  analysis  data  is  chat  awareness. 

These  two  examples,  backup  capability  and  advancement,  illus¬ 
trate  how  job  analysis  data  can  be  put  to  immediate  use  by  mana¬ 
gement.  Similar  examples  of  how  the  same  data  could  be  used  in 
the  other  areas  discussed  earlier  could  be  presented.  For  in¬ 
stance,  a  very  realistic,  realistic  job  preview  could  be  quickly 
and  accurately  created  if  job  analysis  data  were  available.  But, 
the  point  being  made  is  that  job  analysis  data  is  not  just  for 
training,  not  just  for  creating  job  description,  not  just  for 
..onducting  job  evaluation,  etc.  Job  analysis  data  is  for 
managers  to  use  in  a  wide  variety  of  ways.  The  better  you  able 
to  deliver  this  message,  the  better  will  be  your  chances  of 
having  the  job  analysis  program  accepted.  The  approach  outlined 
here  will  help  you  do  just  that. 


LINKING  WORK,  TRAINING,  AND  PROMOTION  IN  THE  COAST  GUARD 


Karen  N.  Jones  and  John  A.  Burt 
U.  S.  Coast  Guard  Institute,  Oklahoma  City,  Oklahoma 


The  Coast  Guard  has  begun  correcting  a  long-standing  problem  in  its 
enlisted  personnel  system  —  the  mismatch  between  the  work  performed  by  Coast 
Guard  personnel  and  the  content  of  our  training  courses  and  promotion 
examinations.  This  mismatch  was  caused  by  the  lack  of  an  accurate  description 
of  the  work  performed  by  Coast  Guard  enlisted  personnel. 

The  work  performed  by  enlisted  personnel  is  described  by  the  enlisted 
qualifications  for  advancement  (quals).  The  quals  are  a  series  of 
occupational  duty  statements  describing  the  jobs  performed  in  each  rating  (job 
occupation)  at  each  paygrade.  The  quals  are  the  basis  for  the  Coast  Guard's 
promotion  examinations  and  rating-specific  training.  As  the.  primary  user  of 
the  quals,  the  training  community  was  aware  the  quals  for  many  ratings  were 
incomplete  and  out-of-date.  However,  the  training  community  did  not  have  the 
responsibility  and  authority  to  correct  the  problem  —  the  rating  manager  has 
the  responsibility  for  ensuring  the  quals  reflect  the  minimum  occupational 
standards  currently  required  for  the  rating. 

The  rating  manager  (called  rating  program  manager  or  program  manager  in 
the  Coast  Guard)  for  each  rating  is  in  the  headquarters  office  responsible  for 
managing  the  operational  program  or  programs  that  rating  supports.  For 
example,  the  rating  manager  for  Machinery  Technician  *s  in  the  Office  of 
Engineering.  In  most  Instances,  a  general  duty  officer  is  assigned  the 
responsibility  for  managing  a  rating  or  group  of  ratings  as  a  collateral  duty. 
Since  these  officers  are  usually  generalists,  they  often  do  not  have  prior 
training  and  experience  in  agency-wide  personnel  management.  In  addition,  a 
large  established  personnel  research  and  development  support  system  (e.g.,  an 
on-going  occupational  analysis  program)  is  not  available  to  assist  them. 

In  1983,  the  training  community,  in  conjunction  with  the  rating  managers, 
started  a  project  to  correct  the  mismatch  between  the  work  performed  by  Coast 
Guard  enlisted  personnel  and  the  content  of  the  quals,  promotion  examinations, 
and  training  courses.  The  vehicle  chosen  to  correct  the  problem  was 
improvement  of  the  promotion  examinations  (called  servicewide  examinations 
within  the  Coast  Guard)  which  are  administered  to  all  eligible  enlisted 
personnel  as  a  part  of  the  promotion  competition. 

The  promotion  examinations  are  designed  to  test  the  job  knowledge 
required  by  personnel  in  each  rating.  With  the  wide  diversity  of  jobs  and 
types  of  duty  stations  within  each  rating,  it  is  feasible  to  test  only  a 
sample  of  the  job  knowledge  required  in  a  rating.  Therefore,  identification 
of  what  should  be  tested  on  an  examination  Is  very  important.  Historically, 
the  content  of  each  examination  has  been  selected  by  a  single  subject  matter 
specialist  who  develops  the  examination  based  upon  his  experience  and 
perception  of  what  was  critical  job  knowledge  for  the  personnel  in  that 
rating.  After  the  Coast  Guard  changed  the  promotion  examinations  to  pass/fail 
examinations,  it  became  even  more  important  to  identify  the  knowledges  for 
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eacn  riling  which  should  oo  teste)  on  the  promotion  examinations  and  the 
knowledges  which  should  be  omitted.  In  this  project  these  knowledges  were 
idt.tifiei  -y  a  panel  ot  subject  matter  specialists  who  prioritized  the  quals 
for  testing  on  each  rating's  promotion  examination  at  each  paygrade. 

Pr iori ti za tion  of  tne  quals  was  aesigned  to  enable  the  Coast  Guard  to 
aenieve  three  goals: 


strengthen  tne  link  between  the  content  of  the  promotion 
examinations  an4.  the  work  performed  by  Coast  Guard  enlisted 
personnel  . 

bevel ap  an  accurate  description  of  the  work  performed. 

develop  promotion  examinations  and  training  courses  reflecting  the 
accurate  description  ot  the  work  performed. 


Prioritizing  the  -uais  would  strengthen  the  link  between  the  content  of  the 
promotion  examinations  and  tne  work  performed.  Since  meaningful  quals  were 
required  to  get  meaningful  prioritizations,  we  incorporated  review  and 
revision  of  tne  quals  as  a  part  of  the  process.  This  review  and  revision  of 
the  quals  would  unable  the  rating  managers  to  begin  developing  an  accurate 
description  of  the  work  performed  (i.e,,  to  revise  the  quals),  Existence  of 
the  revise!  quals  would  then  enable  the  training  community  to  develop  courses 
ind  examinations  reflecting  the  actual  work  performed. 


PROCEDURE 


During  l')«5  and  lJb4,  panels  of  subject  matter  specialists  in  25  ratings 
met  for  one  week  each  to  review,  revise,  and  prioritize  their  quals.  At  the 
start  of  the  planning  for  each  panel,  the  rating  manager  provided  a 
representative  for  the  panel  meeting  and  elected  three  subject  matter 
specialists  from  the  field.  In  addition  to  the  rating  manager's 

representative  and  th<*  three  subject  matter  specialists  from  the  field,  the 
following  personnel  were  on  each  panel:  one  subject  matter  specialist  from 
the  resident  school,  the  Institute's  subject  matter  specialist  (who  developed 
the  promotion  examinations  and  nonresident  courses),  and  a  facilitator, 
within  the  panel  structure,  the  program  manager's  representative  chaired  the 
meeting  and  provided  policy  guidance,  the  subject  matter  specialists  provided 
the  rating-specific  subject  matter  expertise,  and  the  facilitator  provided 
te  s  t lag/ training  expo  r  t i se . 

since  there  is  wine  diversity  within  each  rating,  the  subject  matter 
specialists  were  carefully  selected  to  ensure  broad  coverage  of  each  rating. 
Also,  et torts  were  male  to  include  both  junior  and  senior  personnel  so  input 
from  personnel  ictually  doing  the  work  today  would  be  available.  The  products 
produced  by  the  panels  therefore  woul  i  reflect  both  the  broad  range  of 
experience  possessed  by  senior  personnel  an!  the  current  working  knowledge  ot 
the  ti  rot-1  me  supervisors. 
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Eacn  panel  evaluated  eaon  qual  for  currency  and  'dequicy  as  written, 
rec am nen dec  revisions  10  individual  quais,  drafted  or  recommended  development 
of  additional  quads,  and  evaluated  the  adequacy  of  s^’ti^ns  (or  content  areas' 
of  the  quads.  Then  the  panel  prioritized  each  qual  for  testing  on  eacn 
pay-grade's  promotion  exam ina t ion .  In  this  pr ior i tiza tion ,  the  panel  assigned 
one  of  the  following  evaluations  to  each  qual  at  eacn  applicable  paygrade: 
,1;  essential  to  the  rating,  \l)  necessary  to  tne  rating,  or  (j)  enhancing  to 
the  rating.  Then  the  panel  recommended  testing  emphasis  for  each  examination, 
which  equate-  to  number  of  questions,  tor  each  section  of  the  quais. 
inroi.6hout  the  process,  the  panel  members  were  encouraged  to  identify  problems 
ana  rec„  ru  solutions  to  the  program  manager  and  training  community. 

Iwo  reports  of  each  panel's  evaluations  and  recommendations  were 
prepares  —  one  for  the  Institute  and  one  for  the  rating  manager.  The  report 
tv  the  Institute  was  designed  to  meet  the  project's  first  goal: 


strengthen  the  ling  between  the  content  of  the  promotion 
examinations  ani  the  worn  performed  by  Coast  Guard  enlisted 
personae  1 . 
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|iials  for  testing  on  eacn  paygrade's  promotion  examination.  Since  this  report 
contained  testing  guidelines  which  could  be  misinterpreted  by  personnel  in  the 
field,  it  was  not  given  wide  distribution. 


lhe  report  to  tne  rating  manager  was  designed  to  help  the  rating  manager 
*ee  t  the  second  goal: 

[develop  an  accurate  description  of  the  work  performed. 


The  rating  nanager's  report  contained  the  problems  identified  by  the  panels, 
the  proposed  solutions,  the  overall  evaluation  of  each  section  of  the  quais, 
and  the  recommendations  tor  testing  emphasis  on  each  paygrade's  promotion 
examination.  . no  report  also  contained  the  panel's  evaluation  of  the  currency 
ani  adequacy  of  individual  quais,  recommended  action  for  each  qual  (i.e., 
retain  as  written,  revise,  or  delete),  and  recommendations  for  additional 
quais. 
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lhe  project's  first  goal  has  been  .achieved  and  the  Coast  Guard  is  working 
toward  achieving  the  others.  The  panels  prioritized  the  quais  and  thus 
s  trong  thent-a  the  link  between  the  content  of  the  promotion  examinations  and 
the  worn  performed.  The  rating  managers  are  reviewing  tiae  panels' 
recommendations  ani  many  of  them  have  already  revised  the  quais  thus  providing 
the  coast  ,uard  with  an  accurate  description  of  the  work  performed.  As 
revise!  quais  become  available,  the  training  community  is  revising  the 
training  courses  and  promotion  examinations  to  reflect  this  accurate 
description  of  the  wotk  performed  by  Coast  Guard  enlisted  personnel. 
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Inert?  hive  been  ocher  tangible  and  intangiole  benefits  as  a  result  of 
this  project.  For  example,  as  expected  the  subject  matter  specialists  who 
develop  tne  pronotion  examinations  agreed  the  prioritizations  and  testing 
eriphasis  retlectei  the  work  performed.  however,  a  surprise  benefit  was 
unsolicited  comments  from  tne  field  tna  t  "the  tests  were  testing  wha  t  they 
should  be  testing.”  These  comments  indicated  the  strengthened  link  between 
the  content  of  the  eva nina tions  and  the  work  performed  had  been  established  to 
in  extent  where  it  could  be  noticed  by  personnel  who  were  not  aware  of  our 
ettorts  to  improve  the  promotion  examinations. 

Another  result  was  several  rating  managers  using  the  panels  as  an 
opportunity  to  get  additional  information.  For  example: 

Some  rating  managers  used  the  panels  to  refine  proposed  quals. 

one  rating  manager  adapted  tne  panel  meeting  to  include  extra  subject 
natter  specialists  and  an  extra  task  —  generation  of  a  task  list  tor 
an  occupational  analysis. 

Since  the  rating  managers  were  involved  in  planning  and  conducting  the 
panels,  most  of  them  were  ready  to  implement  the  panels'  recommendations  upon 
receipt  of  tne  formal  report.  For  many  ratings,  the  panels  were  able  to 
provide  the  information  needed  for  the  rating  manager  to  revise  the  quals  so 
additional  work  was  not  required.  In  other  cases,  additional  work  was 
required  and  the  rating  managers  have  been  coordinating  and  funding  the  work. 
Examples  of  this  work  include: 

Occupational  analyses  using  task  lists  generated  by  subject  matter 
spec i a i is  ts . 

Follow-up  panel  meeting  to  refine  the  quals  revised  by  this 
project's  panel  and  assist  the  rating  manager  perform  long-range 
planning  for  the  rating. 

lop-down  occupational  analysis  of  all  functions  performed  at  Captain 
of  the  Port  and  Marine  Safety  offices  in  enforcing  laws, 
regulations,  and  Coast  Guard  policy. 

Ihe  rating  managers  are  reviewing  the  problems  identified  by  the  panels 
and  the  panels1  recommendations.  Some  of  the  problems  identified  have  already 
been  corrected.  For  example,  in  one  rating  the  personnel  did  not  have  access 
la  all  of  the  manuals  needed  to  prepare  for  the  promotion  examinations. 
Within  a  few  months,  the  rating  manager  had  compiled,  printed,  and  distributed 
in  informal  publication  with  extracts  from  the  relevant  manuals. 

other  tangible  and  intangible  benefits  are  continually  appearing.  As 
mentioned  previously,  the  field  is  beginning  to  perceive  the  strengthened  link 
between  the  content  ot  the  promotion  examinations  and  the  work  performed.  The 
impact  ot  other  intangible  benefits  (e.g.,  increased  coordination  between  the 
resident  and  nonresident  school  and  greater  understanding  oi  the 
te s ting/ tr 1  ini ng  system)  are  expected  to  become  apparent  over  Lime. 


RECOMMENDATION'S  FOR  SIMILAR  PROJECTS 


As  a  result  of  this  project,  we  have  several  observations  and 
rer  jmr.enda  tions  which  can  be  useful  in  planning  and  conducting  rhis  type  of 
project  within  the  Coast  Guard  or  in  other  organizations. 

Panel  procedures  and  materials.  Keep  your  instructions,  procedures, 
definitions,  etc.  simple.  Get  the  information  to  the  panel  members  as  soon  as 
possible  f>  allow  sufficient  time  tor  them  to  review  the  panel  materials. 
vOur  not  doing  this  was  the  major  complaint  voiced  by  the  panels.)  Word  all 
written  materials  in  terms  which  are  familiar  to  the  reader.  This  may  require 

different  documents  for  different  groups  of  people  (e.g.,  management  or 

subject  matter  specialists),  but  it  will  save  time  at  the  panel  meeting  and 

improve  the  caliber  of  the  products  produced.  Consider  whether  or  not  it  is 

necessary  to  specify  everything  in  the  instructions.  We  allowed  some  degree 
of  flexibility  fot  our  panels  and  they  had  little  trouble  using  the  procedures 
nd  refining  the  procedures  and  definitions  to  meet  the  needs  of  their 
individual  ratings. 

subject  matter  specialists'  perceptions.  Tne  subject  matter  specialists 
stated  the  project  was  exceptionally  worthwhile  and  should  be  repeated.  Their 
primary  concern  was  that  management  would  not  listen  to  them  —  that  tr.eir 
recommendations  would  "get  lost"  in  the  system.  This  concern  was  handled  by 
assuring  them  a  formal  report  would  be  sent  to  the  rating  manager.  After 
completion  of  the  panel  meetings,  an  area  of  frustration  was  the  slowness  in 
implementing  the  recommended  revisions  to  the  quais.  If  you  anticipate  a 
similir  problem,  minimize  the  frustration  at  the  beginning  by  explaining  how 
ling  it  shoull  take  to  implement  the  recommendations  and  why  (e.g.,  the  quais 
manual  is  revised  infrequently  and  actual  printing  takes  months).  In 
addition,  be  aware  of  problems  or  projects  which  may  interact  with  the  panel's 
work.  As  might  be  expected,  when  we  incorporated  previous  work  or 

consider i tion  of  current  problems  in  the  rating  into  the  agenda  for  the  panel 
meeting,  the  p.uieL  members  were  more  motivated  than  when  previous  work  (e.g., 
proposed  changes  to  tne  quais;  or  current  problems  were  not  included. 

Limitations  of  tne  process.  We  did  find  limits  to  the  process  in  terms  of  the 
tyne  of  quais  the  subject  matter  specialists  could  develop.  Sometimes  the 
subject  matter  specialists  had  difficulty  generalizing  across  different 
si  tui  rions.  dne  of  the  most  common  responses  was:  "it  varies  from  unit  to 
unit,  iistrict  to  district,  etc."  This  problem  was  particularly  pronounced 
when  the  pmels  fried  to  develop  quais  at  the  E-7  through  E-9  levels.  At 
these  levels  Coast  Guard  enlisted  personnel  are  moving  from  positions  with 
technical  emphasis  to  positions  where  leadership  and  management  skills  are 
emphasized.  lhe  problem  we  found  may  be  limited  to  organizations  similar  to 
the  Coast  Guard  where  duties  for  a  group  of  people  are  unit-  or  even  billet- 
spec  i  f  ic . 

facilitator.  Most  of  our  panel  membecs  were  selected  because  of  their 
posi  tion  Te'.g. ,  Institute's  subject  matter  specialist)  or  rating-specific 
experience.  lhe  exception  to  this  was  the  facilitator  who  could  be  selected 
based  upon  otner  factors.  For  this  project,  the  facilitator  needed 

3r>(> 


m  terv  iewing  skills  similar  to  tno  skills  required  of  a  job  analyst  to:  apply 
interviewing  skills  in  a  group  setting;  obtain,  summarize,  ana  record  the 

consensus  of  the  panel  on  revisions,  recommendations,  ind  evaluations;  and 
ensure  participation  from  all  panel  members.  It  was  also  useful  for  the 

ficiiitator  to  be  familiar  with  the  personnel  system,  the  project's  Durpose, 
ini  the  relationship  between  the  products  the  panel  produced  anil  the  personnel 
systen.  In  those  instances  where  we  knew  the  iaciiitators  might  not  have  this 
type  of  MiowIeJge,  we  were  3ble  to  provide  the  necessary  information  in  a 
short  orientation  session. 

followup.  We  ire  in  the  last  pirt  ot  the  third  yeir  on  this  project.  During 

the  time  required  for  this  type  of  project,  you  can  expect  changes  in 
priorities,  funiing,  and  personnel.  'Ihese  changes  do  not  have  to  stop  the 

project  or  stop  followup  ictivities.  The  Coast  Guard  has  haniled  these 
ctunges  by  increased  coordination  between  the  rating  manager  and  the  training 
community  and  among  the  members  of  the  training  community.  This  has  been  very 
important  since  the  rate  ot  change  in  the  qua  is  (and  thus  changes  in  our 

exaainations  and  courses)  is  much  higher  than  it  was  a  few  years  ago.  To  make 
it  eisier  for  your  organization,  we  recommend  having  one  project  coordinator 
throughout  the  project  or,  if  that  is  not  feasible,  maintain  clear  and  concise 
documentation  to  ease  tne  transition  across  petsonnel. 
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rR-\  Soldier  Support  (enter  -  \TF 


Hat'-  r  round .  Ihe  Arrv  Occupational  Surve'  Pro?  ram  (AOSP)  has  ce'1  Voted ,  on  « 
routine  !us',-l  Irnininc  Factor  (IF)  data  !  rom  senior  suporv .  and  manure  is 

v  i  t  h  r '  e’  lifted  Mi'.tarv  Occupational  Rpt  (  ;  <1 1  1 1  1  e^  (MofM  since  Ms*.  I  he 
p:  puro),t  o:  collecting  TF  date,  supplement  nu*  inforrat  uni  obtained  from 

t  ■>  as. ist  training  <_ourse  developers  in  de-  iding  which 
-  either  1"  the  sehoo1  which  has  re sponc  i  r>  i  1  i  t  v  i  or 
ng  or  l'v  supervised  on-H-o- job  training.  Based  upon  a 
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he  used  for  a  spec  i  tic  MoS  suruv, 

Vese  factors  are  as  lollows: 

(  . '•  lerceut  of  members  performing 

.-'’’erase  percent  of  time  spent  h\  members  perform  :  "g 
lark  learning  difficult'.-  (Id) 

Consequences  of  1  ^adequate  Per  format  e  ((OHM 
iask  Dela”  Tolerance  ( IDTl 
PfS  h.ihi  1  i  t  %  Deficient  I’erl  ornar.ct  (PPlM 
Imnediacv  oi  Per ''ormanoe  A  i M ) 

Relative  Frequency  (Rl'l 
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In  eddition,  a  ninth  factor,  Training  Imphaais  (TIM,  although  not  part  of 
I S 1  >  H  Factor  model,  ha',  been  general  lv  used  in  all  AOSP  T1  surveys. 
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Despite  the  wide -spread  use  of  these  nine  IF,  there  have  been  Cw,  if  nnv, 
nret'fvus  I’.s.  Art,,  studies  irdicatinp  t lie  extent  to  which  these  TF  are  related 
and  the  extent  to  which  each  <>l  these  fat  tors  could  aid  in  'critical"  task 
•election.  therefore,  this  studv  addressed  the  following  areas: 


(if  The  I'cgtee  of  common,!  1  i  t  v  between  these  nine  factors;  and 
(  M  Ident  i  f i cn t i on  of  those  factors  which  could  best  i  .olate  "critical" 
tasjs  from  non-"c t  1 1  l ca 1"  tasks. 


"(  i  it  i ca  1 " 
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'Ms  are  those  found  i  r,  the  Soldier's  Manual  for  each  MOS  and  mav  he 
t !  os<»  collective’  or  individual  tasls  which  are  lequired  f  or  mission 
in  combat  or  !  of  survival'  on  the  battlefield.  While  there  are 

task  ,  tor  e  if h  ‘kill  level  within  a  MC  fa  sk i 1 1  level  corresponds  to 
oi  i  luster  of  mthorixed  p.p-p  rades )  ,  the  torus  of  this 
"  tasks  inquired  to  he  performed  -accessfu'lv  !>v 
pe  r-.orr  ,1  •>  i  m  e  t  h  i  ■  '  1 1  ' 
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the  level  it  which  most  formal 
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et  h  do  log-.  .  The  resu’ts  reported  ;r  this  phase  oi  the  stud'  were  based  on  an 

\  fp’\  -w  •  o  •  ■a-'p’ie  do'  e\  o;  jot'  incumbents  and  supervi  sor1  /managers  in  the 
,  ’’  vino  sj-  i  ’  tin:  ntr'Tir’';  Mii  (Tracked  Vehicle  Repairer);  91 C 

;  *  „-t  >c  ’  *'  ''".it  Siipp!”  cpao  i  \  1 1st  > ;  °5R  (Military  Police);  and  UP 

■>  . >rh  Th( <i  si\  Mo.  v  t  selected  tor  this  stidv  primarily  because 

■  core -.e” t  vi'1.  b.  diU'irrut  tcpe.s  iM  work  performed  hv  US.  Army  enlist'd 
^,.r,  •  *  ;.  -t  i  .-.^rceotane ->  i  !  mombe: ■  performing  each  task  and  average 

•  t- 1  •. «- 1  t  ,  '  r  (>  »■  member-  ;e,,'orming  e  o  li  task  U.mprising  the  tirst  two  Tl) 

r,  ,  t.d  job  :ncum!'ents  boldine  these  \<>S.  The  overage  percent  o t 

^  ro-.hers  is  b.i-ed  or.  a  7 -point  Relative  Time  Spent  scale 

.  t.  .  s.  •  .’.erv  much  h.-’ow  >ve  rare )  to  :,7"  (Verv  much  above  average). 

j.>t  t!  .  ''-le,  •  '  '*>  i:u  ’imbei't  rates  the  total  amount  <'f  time  spent 

,  r-~;.v  ...  ’  t.-sk  in  hi-  1  er  present  u>h  relative  to  rhe  tire  be/she  spends 

O' r  *  ' r"  '  ■  .\erv  on, o'*  noi  . 


e  .  .  fn  : 


,  for  wh:< 


ati’igs  wire  provided  by  superv  i  sorv /- 


r  i  i  s ,  ■ 


.  *i — i  t  tssi.'i'e  i  'Miners  (,\c  (  ’*;)  in  each  of  these  MOS ,  may  he 


ecu i r  i : t 
i  • 1  •  r  v  i  s  e . 


,i  :)  -  ;t,  , .mount  o  -  emphasis  that  should  bo  given  to  each  task 

.'-7  tvpe  o'  systematic  training,  e.p.,  formal  training  school, 
1  ", -tie- lob  raining.  Thc  scale  used  ranges  iron  ’’l"  (Very  low 
t,,  "  "  i ; .  rv  high  emphasis ).  lo  response  to  a  task  by  raters  (a  value 
u  is  1  sir  o-'  to  indicate  "do  not  tain”  and  was  included  in  rhe 


v.t  at  io- 


•alues  per  task. 


,  jl  ip  -  teaming  citiuuln-  reflects  t  tie  amount  of  time  required  to 
!  eat  n  to  perform  the  task  sat  isf  .ic  t  or  i  1  v .  I'he  more  time  required,  the  higher 
the  'earning  bitiiculrv.  '’he  scale  use.)  ranges  from  "1"  (Extremely  low  learning 
,i . ;  •  ;v„  '  t  <■)  [s  "7"  .'Urremelv  high  learning  difiicultv). 


»  (('ip  _  pins  !  <ic t o r  i  elates  to  d  o  need  lor  identifying  tasks 

i  -ential  to  .ol>  per foi nance ,  when  needed,  even  !1  thev  aie  seldom  performed, 

i  he  conseqiir  i  ce  of  inadequate  per 1  ormance  oi  certain  ta  .ks  could  result  m. 
in]Ut.  . pei  sonre  1  ,  loss  ol  life,  or  damage  to  equipment.  The  scale  used 
r  mef,  rr< m  "1"  (Extremely  lev  consequence  -  negligible  effect  on 

people,  ur'  omen  r  /mi  don)  to  "7"  (Kxtreme'v  high  consequence  -  mav  result  in 
•  n  hi  r  v  hje t  Ii ,  sc  i  ions  dir  .age  to  equipment,  or  failure  to  accomplish  critical 
r  l  s'  ]  i'ii )  . 

(oi  i  PI  -  Task  tit>  lav  Tolerance  relates  to  how  much  delay  can  he 
tolerated  between  the  time  the  need  for  tad  per  1 ormance  necomes  evident  and  the 
tire  that  actual  per t ormance  begins.  The  scale  used  ranges  from  "1"  (Extremely 
low  -  then  ;  •  virtually  no  requii>*ment  that  an  individual  be  able  to  perform 
the  task  lrir-.'iatel”)  to  "7"  (Ixtrerrlv  high  -  an  individual  must  he  able  to 

inrn,  the  aitivitv  i  timed  i  <i  t  e  1  v  whenever  i  -  encountered)  . 

(7)  PPP  -  11ns  i.ntor  relates  ;  insuring  that  training  is  given  in 
tlio-e  e- sent  iTT  )c'l  dills  in  winch  job  incumbents  frequently  pet  form  poorly. 
It  is  assumed  that  tor  anv  job,  some  tasks  ore  more  difficult  to  accomplish  than 
others.  flie  hig’.-st  ratings  <mi  this  fart-ot  would  ret  loot  the  most  poorly 

per: oi n  d  tasks.  the  scale  used  nuige-.  from  "1"  (Never  performed)  to  "7"  (Verv 

. i cuuenl 1 '  >h  '  u  lent  '  \  performed). 


(<•>''  JM1  -  This  fact<  r  ; s  associated  with  the  interval  of  r  .me  between 
i.  "rVtion  of  t*  l.ning  and  the  first  performance  of  ;he  cask  on  the  job.  The 
'-cn’.e  utilize-4  ranees  from  "1"  (Never  performed)  to  "7"  (Initially  performed 

less  tie-'  .non tbs  ifter  Advanced  Individual  Training'). 

(  V  r  -  This  lector  relates  to  the  frequency  with  which  tasks  are 
o.  r ’ m-d  hv  hF  . ncumbents .  Tr  ranges  from  ”1"  (Ver\  seldom)  to  "7"  (Verv 
■ remit ;  t 1 v) . 

.it  sample  sires  i  or  the  eul*'  skill  level  personnel  (iob  incumbents)  in 

•  ,n  k  o!  these  six  MOS  follow:  (1)  MOS  HR  -  1424  (skill  level  1  corresponding 

authorised  pa\ crude1-  1-3  and  ^ — ♦  > ;  (2)  M0C  b3H  -  332  (skill  level  1);  (3)  MOS 
•It  -  1 b 1  (si i 11  level  2  corresponding  to  authorized  pavgrade  F  —  5 ) ;  (4)  MOS  76Y 
-  -i  (skill  level  P;  (3)  MO?  95B  -  1042  (skill  level  1);  and  (6)  MOS  1 2B  -  879 

ink’ll  level  1 1 .  The  number  o:  raters  hv  MOS  for  each  of  the  seven  TF  ranged  as 

•  O'ovs:  (1)  MOS  11b  -  3b  for  (01?  to  104  for  IM;  (2)  MOS  h3H  -  28  for  CO IP  to 

'1  nr  T;  and  (O  MOS  9 1  ('  -  31  for  TOT  to  43  for  PDF;  (4)  MOS  76Y  -  29  f  t  CO  IP 

to  s1  *  f  r  R F ;  (S'1  MOS  93h  -  (F  for  ’rIvI  to  95  for  IM;  and  (6)  MOS  1  MB  -  19  for 

<  ('IP  to  30  nr  TF  and  1M. 

'he  i omprehensive  Occupational  Data  Analysis  Programs  (CODAP)  were  used  to 
Main  data  files  for  the  six  MOS  consisting  of  computed  mean  values  of  the  nine 
''  -'or  each  task  in  the  MOS  questionnaires.  The  Statistical  Package  for  the 
Social  Sciences  (SPSS)  was  trier,  used  for  each  MOS.  To  examine  the  degree  of 
inter-  correlation  among  the  nine  TF,  a  Pearson  correlation  coefficient  matrix 
wan  generated  u  Mowed  b’  factor  analysis  with  varimax  rotation  of  the  principal 
Motors.  To  determine  which  factors  could  best  descriminate  "critical"  tasks 
'rom  non-"cr i r ical"  tasks,  step-wise  discriminant  function  analysis  was 
uti.i/td.  For  this  latter  analvsis,  only  the  seven  TF  based  on  ratings  provided 
hv  senior  supervisor',  /manager  i.i  1  personnel  were  examined.  This  was  due  to  the 

•  a i  t  that  the  senior  raters,  who  could  have  responded  to  each  task  in  the 
quest  io’uv. i re  for  each  o 1  the  seven  rating  TF ,  theoretically  would  have  attached 
greater  importance  to  each  "critical"  task  impacting  on  successful  mission 
per  romance .  On  the  other  hand,  job  incumbents  would  have  responded,  hv 
direction,  cnlv  to  those  tasks  which  they  had  performed  or  had  beei  trained  to 
perform  in  tfieir  current  dutv  position.  Also,  regardless  of  the  outcome  of  this 
stud;  ,  tie  two  IF  obtained  from  iob  incumbents  would  continue  to  be  collected 
•or  each  Mi's  to  be  survived.  Conversely,  focusing  on  the  seven  TF  rated  hv 
-•■nor  per.  upi'.ei  wot: hi  facilitate  reduction  of  the  number  of  factors  which 
ttu'nmg  deveh>p;nent  personnel  need  n  examine  for  "critical"  task  selection. 

k  mil  i  ngs  . 

A.  IF  Reliability.  To  determine  internal  consistency  of  the  data,  two 
tvpes  of  re  1  l  ..b  i  1 1 1  v  i  eel  f  it*  i  er,  t  s  were  computed  for  each  of  ttie  seven  TF  for 
t  he  ,e  mv  M<''  obtained  from  senior  raters:  (M  the  average  inter-rater 

reliability  if  a  single  rater  (r!l);  and  <  2)  the  stepped  up  reliability 
loci  in  ter.!  ii'lectitig  the  overall  gimp  of  r.itets  lor  a  particular  TF  (rkk). 
In  general,  the  rll  value1,  were  moderate  wfii  1  e  t  h>*  rkk  values  were  consistently 
high  .ii-Mu  <i  1  1  t  in  ce  vos .  With  respect  to  Ml'S  1’R,  the  -*11  values  ranged  from 
.  ’0  for  rn Ip  ind  ll’T  to  .39  for  IM;  'lie  rkk  values  tanged  from  .91  for  (!01T>  to 
.'IK  tel  IM. 
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TABLE  1 

-  INTER- 

CORRELATION  MATRICES  FOR 

MOS  ilB 

,  63H 

AND  9  1C 

MOS 

IIP 

TE 

LD 

COIP 

TDT 

PDF 

IM 

RF 

%Do 

A  VC.  7  TIME 

t»: 

1.0 

-  .  45 

.81 

.78 

.86 

.91 

.  77 

.86 

.76 

i.'j 

1.0 

-.25 

-.35 

-.43 

-.58 

-.39 

-.57 

-.55 

roil* 

:  .o 

.90 

.56 

.79 

.81 

.64 

.61 

T!)T 

1.0 

.9? 

.84 

.92 

.65 

.68 

FIT 

1.0 

.92 

.94 

.76 

.  76 

r< 

1.0 

.84 

.91 

.87 

K1' 

1.0 

.67 

.71 

'Do 

1.0 

.94 

A  VO  " 

TIFF 

1  .0 

MOS 

63H 

II 

I  D 

COIP 

TDT 

PDF 

IM 

RF 

"Do 

AVG  l  TTMF 

TF 

I  .0 

.68 

.77 

.84 

.50 

.76 

.87 

.62 

.59 

1  F 

1.0 

.73 

.84 

.67 

.36 

.82 

.13 

.  22 

coir 

1.0 

.75 

.80 

.57 

.81 

.36 

.38 

TDT 

1.0 

.82 

.56 

.90 

.37 

.42 

PDF 

1.0 

.75 

.87 

.56 

.56 

IF 

1.0 

.  66 

.79 

.75 

RF 

1.0 

.44 

.49 

7Do 

1.0 

.94 

A  VC  7 

TIME 

1.0 

MOS 

9 1C 

TE 

LD 

COIP 

TDT 

PDF 

IM 

RF 

7Do 

AVG  %  TTME 

I II 

1.0 

-.27 

.27 

.8? 

.73 

.78 

.76 

.83 

.76 

I.D 

1.0 

.55 

.04 

-.40 

-.61 

-.53 

-.47 

-  .4? 

COIF 

1  .0 

.50 

-.04 

-.14 

-.08 

-.04 

-.02 

TDT 

1.0 

.68 

.61 

.60 

.68 

.65 

PDF 

1.0 

.81 

.83 

.82 

.77 

IM 

1.0 

.81 

.90 

.83 

RF 

1.0 

.87 

.86 

TDo 

1.0 

.96 

A  VC  7 

TIM! 

1.0 

'.‘-.ed  on  the  generallv  high  inter-cor re let ions  of  all  nine  factors  for  each  of 
these  six  M0L ,  it  was  net  altogether  surprising  that  through  factor  analysis 
iwit1  varimay  rotation  of  tlie  factor  matrices)  only  two  factors  emerged  for  each 
oi  the  six  vOS.  <\s  shown  in  Table  3,  the  principal  factor  for  MOS  UP. 
(accounting  for  90  percent  of  the  total  variance)  reflected  significantly  high 
:  actor  loadings  for  all  factors  with  the  exception  of  ID.  Similarly,  the 
principal  factor  for  ^0?  f>3H  (representing  80  percent  of  the  total  varinree) 

reflected  signif icartly  high  loadings  for  all  of  the  seven  IF  -  provided  by 

senior  liters;  of  sore  interest  was  the  fact  that  the  two  TF  produced  by  job 

incumbents  ("-  IV  and  Average  percent  time  spent)  had  insignificant  loadings, 

"he  priirarv  factor  for  MOS  91C  (accounting  for  78  percent  of  the  total  variance) 
i  ad  factor  loadings  above  0.8  for  all  factors  except  LD  and  (OIP.  Similar 
results  were  also  obtained  for  MOS  76Y,  95B  and  12R.  As  noted  in  Table  4,  the 
primipal  factors  for  MOS  76Y  and  MOS  95B  (each  accounting  for  90  percent  of  the 
■ariancel  reflected  significantly  high  loadings  for  the  nine  IF  with  the 
exception  cf  C0IP  for  MOS  76Y.  The  principal  factor  for  MOS  1?B  (accounting  for 
pi i.ent  of  the  variance)  also  revealed  substantially  high  loadings  for  each 
of  tlu-  ume  IF.  LTiat  these  findings  indicate  is  that  there  is  apparently  just 
one  unnerlving  general  TF,  rather  than  nine  different  TF. 

1  ATI F  3  -  VARIMAX  K0TATKP  FACTOR  LOADINGS  OF  NIX  I,  TRAINING  FACTORS 
AN  PRINCIPAL  FACTORS  -  vOS  1  )  R ,  MH  t  AND  91C 


C.  Prediction  of  "Critical"  Tasks.  In  predicting  "critical"  tasks  from  the 
total  task  inventory,  only  the  seven  TF  derived  from  senior  raters  for  each  of 
these  MOS  we^e  vised  as  independent  (predictor)  variables  for  the  reasons  noted 
previously.  The  objective  of  the  use  of  stepwise  discriminant  analysis  was  to 
determine  the  fewest  number  of  TF  which  best  predicted  "correct"  group  member¬ 
ship.  That  is,  it  was  desired  to  isolate  those  predictors  which  signif icart  1  v 
increased  the  percent  of  tasks  classfied  correctly.  The  first  factor  entered 
was  that  which  had  the  largest  value  representing  the  greatest  power  of 
discrimination  as  measured  by  Rao's  V.  Subsequent  variables  selected  were  those 
which,  when  added  to  the  previously  selected  predictors,  measurably  increased 
the  percent. ige  of  tasks  classified  correctlv. 

Table  '  displays  those  factors  which  best  achieved  this  objective  for 
MO?  lib,  6  3H  and  dir.  Table  6  shows  those  factors  for  MOS  76Y,  95B  and  12B. 
lor  each  factor  selected  in  these  six  MOS,  the  concomitant  change  in  Rao's  V  is 
displayed  together  with  the  cumulative  percentage  of  tasks  classified  correctly. 

TABLE  5  -  PREDICTION  OF  "CRITICAL"  TASKS  - 
MOS  1  !  R ,  6 311 ,  AND  9  1C 


<  INDIVIDl'AL  |  j  PFRCKNT  OF 

STFP  FACTOR  CHANGE  IN  j  TASKS 

FNTERFD  _ SELECTED  RAO'S  V _ j  CORRECTLY  CLASSIFIED 


ilB 

63H 

9  1 C 

11B 

6  5H 

9 1C 

1  IB 

63H 

9 1C 

) 

i 

TF. 

LD 

TF. 

38.0 

195.1 

123.5 

63.7 

76.6 

69.0 

•) 

COIP 

TE 

LD 

37.9 

61.8 

35.9 

68.5 

80.  1 

74.1 

3 

IM 

COIP 

* 

5.9 

26.2 

| 

70.8 

82.2 

JL. 

A 

TOTAL 

1 

- 

- 

1 

- 

1 

"  i 

72.4 

83.9 

75.2 

*  The  variable 

for  MOS  9  1C 

which 

was 

1 

entered 

at  step  3  (RF)  caused 

a 

statistically  insignificant  change  in  Fan's  V. 


TABLE  6  -  PREDICTION  OF  "CRITICAL"  TASKS  - 
MOS  76Y,  9 SB,  AND  12B 


STFP 

ENTERED 

INDIVIDUAL 

FACTOR 

SELECTED 

CHANGE  IN 
RAO'S  V 

|  PERCENT  Or 

TASKS 

CORRECTLY  CLASSIFIED 

76Y 

95B 

12B 

76Y 

95B 

12B 

76Y 

95B 

12B 

1 

TE 

TE 

TE  ! 

64.9 

221.2 

48.7 

74.0 

77.4 

66.6 

COIP 

* 

TDT 

00 

* 

13.2 

76.6 

k 

72.7 

3 

IM 

■k  >v 

COIP : 

19.2 

** 

8.5 

80.0 

kk 

7  5.6 

TOTAL  j 

- 

- 

- 

- 

- 

_  1 

J 

80.0 

75.6 

75.9 

*  ll.e  variable  ( or  MOS  95P  which  was  entered  at  step  2  (PDP)  is  not  shown 
because  when  it  was  added  to  TF  which  was  previously  selected  the  percent  of 
tasks  correctly  classified  decreased .  **  The  variable  for  MOS  95B  which  was 

entered  at  step  5  (RF)  caused  a  statistically  insignificant  change  in  Rao's  V. 
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As  indicated  ’n  Tables  5  and  6,  TE  was  the  predominant  ractor,  being 
entered  .it  either  step  1  for  live  of  the  six  MO?  and  entered  at  step  P  for  the 
other  M  o  (MH1.  The  second  best  predictor  of  correctly  classfied  tasks  was 
( ('IP,  appearing  among  die  top  three  best  predictors  in  four  of  the  six  MO?.  Two 
other  predictors,  IP  and  1M,  were  noted  in  two  of  these  MGS. 

it  was  observed  that  the  percentage  of  tasks  classfied  correctly  for 
c  ’  ;  k  (  3  w  and  95B  was  higher  for  "critical"  tasks  than  for  non-"crit  ice  I " 

tasks.  rh  r  V0S  IlB,  these  percentages  were  80.8  percent  vs.  71.4  percent.  For 
M|'.o  1  t:iesL  percentages  were  9J.0  percent  vs.  7?»,9  percent.  ?imila.'l\,  for 
"v~  °r'?.  thev  were  30. 2  percent  vs.  75.8  percent.  On  the  other  hand,  the 
percentage  of  tasks  classified  correctlv  for  MOS  910  and  MO?  IdB  was  higher  for 
ioi-"c '  : t  ica  I "  tasks  than  for  "critical"  tasks  -  79.1  percent  vs.  87. 4  percent 
fo*-  MO"  ‘•'It'  and  “*->.?  percent  vs.  '1.9  percent  for  MOS  !PB,  respectively.  Ft 
'V-  "oy ,  these  percentages  were  the  sa^e  -  HO  percent. 

Q-iclusions  and  Imp  1  i cat  lens . 


It  wa«  evidoT  that  there  is  a  remarkably  high  degree  of  cor^e’atior  between  the 
-even  7  F  rated  senior  super'- i sorv /managerial  personnel  and  the  tvo  IF 
-•■’'ti”g  to  ’  h  incumbent  t forma  cion  (reflecting  the  :  >rcent  of  mem'-ers 
nerfor~ing  at  the  entr'-  skill  level  and  the  average  rerc^nt  time  spent  b\  the^e 
menders).  What  these  findings  suggest  ' s  that  rathe,  than  nine  separate  fac‘  rs 
the  re  exists,  i-  reality,  only  one  clearly  defined  factor,  similarly,  terms 
of  predicting  "critical"  vs.  non’"crit ical"  tasks,  rat: or  than  nine  individu.  . 
factors  there  is  essential!)  onlv  TF  (and  probably  OOIP)  which  could  he  thought 
of  a1  genera!I\  consistent  significant  predictors.  These  <‘Irning  ^juld  provide 
a  basis  for  training  developers  to  decide  which  training  fnctor(s)  is  (are)  most 
hene’  icial  for  "critical"  task  selection  and  thus  facilitate  heir  el  forts  to 
identify  accurately,  entry  level  training  requirements.  In  >-irn,  fhe  A0?P  could 
collect  the  most  useful  amount  of  T  data,  primarily  to  aid  these  training 
developers,  fror  a  minimum  number  of  factors.  This  w  Id  enable  the  AOSP  to 
sivi'i f icant 1 v  improve  its  administration  of  occupational  surveys. 
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car.  ce  has  been  c  .ceptual  .zed  as  a  product  of  individual 
.lit.es.  and  sKii.s  wh.cn  are  measurable  at  one  fn«  .  . 


a.  fust  t.:s  the  organization,  of  er.virormental/orgamzat.onal 
:  w  .  .  impact  cn  tr.e  individual  after  jcfc-entry  and  of  he  per- 

t.vati^r.  to  perform.  Previous  emp.rical  research  has  investigated 
forr.ance  m  terms  of  taxonomies  of  nurr.a.n  aoiiities,  values,  and 
i  ty  charaote  -  .sties  ’  Tur.net  te,  -v7c,-.  However,  until  recently 
esearcr.  has  ,cusei  or.  develop-. ng  taxonomies  of  ,:.v;  ronmenta./ . 

...al  variaoles  or  examining  rela licr.sr.ips  between  nese  factors 
-re.ated  outcomes. 


Ir.e  major  purpose  of  tr.is  research  was  to  examine  re  .tionship'-  among 
.r.Jividua  ,  organ,  national,  er.vironmentf "  factors  job  cnaractenstic 
variaoles,  nd  measures  of  both  maximal  ,e.g.,  h&nds-oi.  and  job  knowledge 
‘■fits/  and  typical  ,e.g.<  c  oervisory  and  peer  ratings  of  performance} 
performur.ee  criteria  fc  first-tern  soldiers  m  the  Army.  In.s  r  per 
discusses  results  from  a  inun:  ■>  •  an  ng  a  MG--  cm  Army  J  ork  Hr.  i  ronm^nt 
Juest lor.na . re  ;AWEi)  to  bOO  first-term  enli.  ted  person-el  from  five  miii- 
tary  occip.tional  specialties  (KOS). 

A  ma„or  impetus  for  research  o>  environmental  variables  was  tne  work 
>f  Johr.eider  v  1  ^  f S / ,  wi  o  proposed  t;  it  suer,  situ&t.onal  influences  <s 
^■‘b.'tasK  cr.aracteristics,  organi  ational  practices  (e.g.,  reward  system) 
and  c'.nate  variables  could  either  directly  influence  performance  or  mod¬ 
erate  the  relationship  between  cognitive  abilitin.  and  perfo-mar.ee.  Dur¬ 
ing  the  early  ‘ J80 1 s  several  research  pro  -cts  were  initiated  to  deveio- 
empirically  validated  taxor.on.  of  env  ronmen  .a  vanai  les  (e.g.,  Peters 
a  "  1  Icr.nor,  '^60;  Clscn,  Borman.  Rcte'oon,  <$  Rose,  19b4/.  Results  from 
tne  J“ . elopmer. „  of  situational/ env. r  nmental  taxonomies  have  suggeste- 
that  situational  var.sbles  car.  .e  dentif.ed,  categorized  a..d  . eliably 
mearur.J.  In  a  series  of  laD'-rurcry  studies  conducted  by  Peters  and 
'  'onne-r,  and  the.r  colleagues  .for  a  rev;  .w  see  Eultu-rg,  O'Connor,  Peters 
c.  Aatson,  1  un4 ; ,  results  rave  demonstrated  that  situational  constraints 
;.i-.  >  i  gi.  i  *  u^ntly  relate'1  to  ineffective  tasK  performance,  j..b  ’issatis- 
faction,  and  .ncreased  frustration. 

Although  c'orrele*  .ona .  field  studies  have  supported  the  relationships 
between  er. v.  ror.mental,  si  tuational  variables  ,-.d  affective  reactions  to  the 


job  .e.g.,  satisfaction;,  associations  between  these  factors  and  ratings 
of  performance  effectiveness  have  teen  inconsistent.  For  example, 


’The  v.ews  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  -eflect  the  view  of  the  U.S.  Army  Research  Institute  or  the 
Department  of  the  Arr.  . 
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.  .  *  e  -  1  A  '  ■ 


-  a  r. c  re.ai.or.srjit 


. r»  :'  o  .  t  .  •■  v .  r.a.  c  rstraiiits  am  tjt.n  performance 
•  r  r  •  •  1  cr.t-r.a  a  r.at-cnal  sar.rle  of  :::.ver.- 


...  -o  r.  r, 


. r.  a:,  .  .t . 


r.r._r,  Fetors,  Pulterg  i 


i..  .  ' _  rr~ . a t .  „:.r  «“rc  observed  tetweer. 

*.  r-  a.i  -r  per:'  rmai.'e  ^r  rtt-ru  istmer.t 
r  *  i‘'jvs  t  f  fi  i.i  iron  j  t  K  r 


.*  •  'l  -  .  ,  *  4  ”  *♦  I  *  '»>  \  V  *  *  *  r ^ 

:.&*•.*  *rfc:v;  :a 

'i  ' r  .'c  h  .v*r; 


ress  v h .  c h  ..sad  a  .  f  f " r--.\ t 
t  re„&t,.ons:.ips  .  rs  raided 
mental  factors  ar.d  cer- 


t\-  r-r  ,.v  f.r  re.at.^r.s.n.  p  *  c etwees  er.v.r^!.- 

p-rf  rman  n p.-.-tt  tne  mag.n.  *  ude  cf  the  Currela- 

•.  a  r-  lent  ;:.  tn~  „e.e.  .  f  .nr..  ci  to  rs  fact  *itators 

:  .■  w.  r  -:.v.r  .  :dr*r.«r,  tne  way?  s:  tv.3ticr.al 

.  p  •„  a ,  t:.e  x  . :. : t  ,r  '  mves  t*ga  t-  d ,  am  t.ne 

r.t-".3  ma/  . m r a c :  the  vSCerved  rela- 


r-.  ..-'jr.:.  .  a  r.r  .e  _•  r.ta.r.ed  trig'  *'.rst-tern  er.x.stfcd  personnel 
Ir.er-  wer*.  ’  '2  „r.far.t ry.er.  VH?  MC3/t  1 0 j  armor 
J  ,  '44  radio  teletype  operators  3 1 J  MuS;.  Iigr.t  wheel 
os  '  "r  v ...  ,  arc  ■'  me  j .  ■  a .  oare  3i)ec*.a  lists  \’j' k  Muo  /  • 

,-irpl-j  a*  fv,r  i.nt.r.enta*  United  states  ar.d  two  European 


o.  Ar.  assessment  battery  containing  ar  environmental  questionnaire 
tp  r-.  v.-no set  if  t,.ri'al  e.g.,  supervisory  ratings)  and  maximal 
ot  x.v  w ledge  test  p-‘rf urmar;  a-  measures  was  used  m  this  research. 
ny  rt  o .-  r.v i  ronmer.t  .guer  t.or.na:  re  A  Why  > .  The  Army  Work  Envi ron¬ 
es  t  *  tnr.ax re  . s  a  1‘  0-item  multiple  choice  instrument  that  measures 
r.s.ons  ti.r  Army  worx  er. *  i r  it  r.ent .  The  AWKQ  was  constructed  m  a 

pr:  .  l.;or. ,  e*  al.  1  •  briefly,  in  o^ag*3  I,  a  taxonomy 

•-*.  aV  c.'..,.r/.r.er.fi.  ir, faiences  oi.  soldier  performance  was  derived 
ap  p  . .  :at .  rr.  cf  a  critical  .nc.cent  methodology.  A  total  of  262 
1  „r.  c.d^nts,  gene  rat*  u  sy  Arm,,  experts  ,.’J  =  67  J  and  independently 
-ma  4  b,,  ?..x  p.cy 'holegists,  identified  envi  rui.mental/orgamsa- 
:n:.o.r,.vo  beyond  tne  control  of  tr.e  soudier  that  had  a  significant 
n  performance,  e.ti.er  inhibiting  or  facilitating  that  performance, 
y  w .  rr.  •  n/i  ronr.' ..t  taxonomy  contains  tne  following  nine  "job-on- 
f  a  ■  t  rs  :  '  :  hej  juries  too  *;.q  ji  pment ,  (7.)  Work  load/Time  Availa- 

*,  Tra.n  ng,  .4;  j.iysi'al  Wor.cng  'onditions,  o)  Job-Ke  levant 
t.on,  c  Joe  relevant  A.thonty,  Perceived  Job  Importance,  (0) 
.. „g.n:rer. t ,  and  ■  J  ar.ges  m  Jon  Procedures/  Lquipment,  as  well  as, 
a.:,  .ng,  f..'e  "  c  l :  r.a  te- ;r  .emt^d"  dimensions:  piO;  Reward  System, 

;:.j  hr.'-,  ,K;  I'i.v^lual  Supi.ort,  vi  r> )  Job  Jupport/Juidance  and 
.  <■  M.ieln.  In  P  ‘  i(y>  II,  items  were  written  to  cover  the  content  of 
env .  rer.men  ta  I  d  i  mens .  on  . 

-mr  a,  tne  ; are  descr.p  ci  ve  „n  nature  and  respondents  are  a3ked 
cat*3  .  a  h-p'O l nt  rating  scale  ve.g.,  1  =  Ver,.  Seldom  or  Never  to 
,,  j2‘  ■  m.  Always/  t.ow  ften  »:  i  ;n  env t ruinrenta  1  situation  described 
.  *> -r,.-  ,r:  m  t*’^ir  i  resent  ,.ob. 


rr.ar. : e  "■.•a. 


_ T.ne  set  of  c.eal  and  ma :< . r.a performance 

.  r .  .  r  _  .  *see  *r.  tf.ts  "til."  w  a  s  dev*  *'jed  as  <.  oonronent  a  *  a  craader 
re.e  iron  p  r  op  ram.  vend  'ed  t:  ter  F .  .  -a*  A:  Inprc..r.g  t  ;■*>  Selection, 

'  1  s.-c;  f  .oat .  •.,  :ir.d  "til.-at.cn  a:  /  Hr....;ted  !'er.,  .r.r.ei .  .h.s  compre- 

r.-'n.-.ve  nine  /ear  resear.n  effort  was  :r..t.atet  to  r.e.p  th-  Arr.y  access, 

•-  ■  =!  <.  .  •) 


:a.  r---r;cr'.r  "e 


:er:a  .r.  • . 


.  -i  - 1 


cope  "Visor/  and  peer  job 


■f^r-an--.  rat.r.rs.  J-.  js.ite  ter.a..  -al-y-ar.cnored  rating  -ales  v  BARS  ' , 
r,m  a  ir.t.ca*  ...'.dent  .  ,c  a;u.,.s.3  procedure,  were  u.:-  d  to 
:  "ur~  octn  t.ne  X  13  \  foe, -spec:  f  -  ar.c  Army-wide  c  .“.pone  .ts  of  sold.er 
c  .:.  d  effectiveness  .!.  a  -point  conaVior  rating  format.  For 
:■  reo-.-:r  .'.  :  a rt :  o. par.t  ir.  the  X  ,f,  a:.  Army-vid  ?  and  XTS-specific 

.no  wu-  ,  m : ..  t  :  by  j.v:  ag.ng  t.ne  performance  ratings  a  n  .cs  all  i  n  i .  - 

'■  o  rax.r.'i'  _  er forma:. ee  criteria  .r.duded  r.ands-cn  .work  samrle, 
ts  .  oh  a:. -v>.;dge  measures.  The  hands- or  t-sts  for  eacr.  Xo.  con- 

tel  it  tasr'.s  .cer.MitJ  for  t.ne  XIX*  T.ne  .'i.v.dual  performance 

;  n-  'c  ""  f”  tic-:  wer->  scared  ty  t  ra :  -  •  i  ’■rers  on  a  pass-fail  basis 
an  c/era-,  'a  s-cr,  c'tre  was  compute.!  t  r  each  soil.*'  by  averaging 


casoed  a  •  "'ss  t.ne  tasxs  teste: 


K’i I  tit;  e-choi  tests 


i ..  * 

was 


cpei  t'  >csess  v  l  tcr.owleage  re.e".  ■  t  to  t.  .oh  important  task  for 
i.n  i-.-r--.ii  - 1  Know  •  test  scor--  ;.r  eacr  researor.  ,  -« rtieif  -rnt 

1  as  i  p  er rentage  o  'nr*  nsmcer  .  t  iters  ansvr  ooi  correct iy 


.-  rc  -ei  i  res .  After  tr.e  supervisor  am  pee  aters  were  trained  to  use  ti- 
Army-wide  and  XlC-specific  B.Aro ,  they  eva.  luted  :-.e  job  performance--  of 
scld.ers  m  tr.e  researen  sample.  Concurrent i\  witn  then?  asset  -meUvS, 
first-tour  soii.ers  participating  in  the  r.  .-arch  were  -dmimsterod :  (a 
t- Army  W o r k  Environment  -Jnestionnaire  and  by  the  appropriate  job  knowl¬ 
edge  and  harJs-or.  test.  For  all  respondents,  scores  on  the  e:  vA  onnental 
measure  were  merged  wit:  :-c-res  from  the  maximal  and  typical  performance 
■r: ter.a  for  analyses. 


A  I’D  IT  31 


Jtv-O  A  ‘ 


.  e  1  present 


ctr  tn-  to ta .  sample, 
t. -.ns,  at.  i  re.iali.uty  coefficunts  . 
ratings  or.  the  AWE,  scale  dimer.'  .ons 

coii.piex  sex  ol  b-'-th  facilitating  and  mint— 


t.ne  mams,  itindard  devia- 
r  t.ne  re  sear on  mens  .  ’’-s.  When  mean 
re  collapsed  across  X03  and  irstal- 
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Table 

1  shew 

tiia  t 

-1.7b1,  arid  Job  Support  \X  -  -1.42)  were 
in  contrast,  ..mh  environmental  variables 
-  .  ibCipline  rractices  :._K  =  1.10.), 

.7'-)/,  and  ad-epua ; y  of  role  Models  (_X  “  .74)  were 
]•;.  'hn'orre^  te  3  reliability  estimates 
t  job  knowledge  tests  tend  to  be  the  most 
rc'iable  of  the  maximal  performance  criteria  and  the  Army-wide  BARG  (su- 
piervisors;  have  the  larges.,  coefficient.,  of  tne  typical  performance  meas- 
ir<-s.  'dune:  ally,  th«  Ai'Ev,  scale  i-tor^n,  with  coefficients  ranging  from 
.r  ,  t  .  ,'m,  tiav<>  adequate  reliabilities  for  a  ’•esearoh  instrument. 

'.able  '*  presents  the  :  nt-r  lorrelat  i  >n  matrix  for  the  AWKQ  scales, 
.iitercorrcoition.'  among  the  1 4  aWTQ  scales  ohow  thax  the  climate-oriented 
dimensions  are  p.>re  highly  reluted  than  ti.e  job-oriented  factors.  For 
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Tab.s  * 


XtttSj,  Standard  Psvlttioaa^  ta."  Cosfflciists  fc: 

Stli'tti  Xsaa-fsa  Itroaa  *03. 


Xvasurso 

i 

X 

SB 

r1 

irmp-ifids  EJJLS  (Pasra) 

727 

4.52 

.71 

.78-. 66 

Irmj-W.ii*  3-L3S  ^Supervisors) 

722 

4.50 

.84 

.81-. 86 

r.S-;psc;fic  JJiiS  CPss-?) 

727 

4.60 

.56 

.76-. 86 

»:$-5pscific  EJJtS  (Su*.*?  ^isora) 

718 

4.62 

.77 

.76-. 87 

E.aoda-oa  Tsat 

685 

71 .72 

16.11 

•35-. 56 

Jo  Tnovlsd|s  Tsat 

745 

62.47 

10.63 

.84-. 91 

1915  Sc.:.»  (I  of  Ium):2 

Ssic.rcsa  (nv  , 

754 

-.99 

4.96 

.75 

Vcrlload  (5-3 

752 

-.67 

4.34 

.58 

Train 104  (0*1  1 ) 

736 

-3.02 

5.91 

.64 

^bjalcal  Working  Condition*  (s.*i) 

741 

.67 

3.83 

•  57 

Job  Authority  (c**6) 

760 

-.25 

3-65 

.57 

Job  Inf* ’’cation  (c**8) 

726 

.45 

4.60 

.67 

Job  lap  Mttss  (n-7) 

725 

1.76 

4.65 

.67 

Work  iaa.4r.asnt  (t-9) 

731 

-1.90 

6.80 

.70 

Cbaejss  Ls  Job  Procedure*  (n**8) 

745 

-.89 

4.21 

.58 

Isvard  Spots*  (n*7) 

736 

-1.75 

5.14 

.78 

Siseiplln*  (n-6) 

751 

1.10 

4.07 

.65 

led! vidual/Suppcrt  (n*9) 

727 

.79 

5.46 

.73 

Job  Support  (n-8) 

734 

-1.42 

5.12 

.72 

Sola  "ode Is  (s-’O) 

7)1 

.74 

5.96 

.71 

Hots .  t).  for  psrf  -max-cs  ratings, 

tbs  range 

of  in t s rra t s : 

r  nUabiUtl 

OB 

acroaa  MOS  ara  rtporUd. 


7or  Sanda-on  and  Job  tnovlsdf*  toots,  tbs  rvn^s  of  *plit-half 
rsliabilltiaa  across  *03  ara  roportsa. 

7or  tbs  Isvlronssni&l  acslss,  Crcnbacb’a  alpha  coofficlabts  arm 
ussd  a*  Btasursa  of  Lntsraal  conaiatancy. 

2).  Hsan  seals  scorss  wars  coaputsd  such  that  "0*  la  a  nautral 

sevironasst.  Positiva  assn  Talus*  indicat*  positiva  descriptions 
of  tbs  onvlroixasat  for  that  acals.  IsgaUva  seals  assn*  lndlcats 
tbs  opposlts. 


Table  2 


Seals  IctsrccrrslstiODS  for  tbs  AYBQ. 


19*0  Se.l.i* 

1 

2 

3 

4 

5 

6 

7 

6 

9 

10 

11 

12 

13  14 

1. 

Svaourcso 

2. 

Workload 

.52 

- 

3. 

Trai&lnf 

.29 

.26 

- 

4, 

Working; 

Conditions 

.55 

.46 

.23 

- 

5. 

Job  Authority 

.47 

.52 

.39 

.51 

- 

6. 

Job  Information 

.52 

.50 

.42 

.50 

.60 

- 

7. 

Job  laportanca 

.2- 

.20 

.30 

.22 

.32 

.33 

- 

8. 

Work  Assistant 

.26 

.24 

.66 

.13 

.35 

.36 

.4) 

- 

9. 

Job  Proesdursa 

.49 

.51 

.44 

.43 

.50 

.51 

.24 

.40 

- 

10. 

Isvard  3yat*a 

.36 

.40 

.40 

•  37 

.58 

.56 

.29 

.33 

.45 

- 

11 . 

Discipline 

.31 

.31 

.18 

.37 

.47 

.43 

,30 

.14 

.36 

.45 

- 

12. 

Individual 

Support 

.31 

.32 

.35 

•  36 

.56 

.60 

.54 

.27 

.41 

.62 

.54 

- 

13. 

Job  Support 

.39 

.40 

.44 

.36 

.64 

.62 

.34 

.37 

.46 

.73 

.46 

.72 

- 

14. 

Iols  Medals 

.41 

.46 

.44 

.42 

.61 

.60 

.34 

.35 

.50 

.56 

.43 

.56 

.45  - 

lots.  All  AWTQ  seal*  lnttrcorrslatioa*  ars  aicnlficaat  it  j  <  .0^. 

*CorrslatiODe  significant  #t  <  .O'?. 

1).  Seal  ft*  1-9  ars  aors  Job-crlsutsd  a-nfl  acalss  '0-14  t.ra  mors  cltmats-orlsntsd. 
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•.a ted  with 


Reward  System 


r  -  .  ,  ’  :  <n .  .  ;  ,rt  and  Hale  Models  (,£  -  .  Of } .  I'ne 

„<•  ant  -  .t-..r.ty  i .  r.er  s  .  cor. ;«  ptua  1 1  red  as  a  job-related  faster, 
a  .  • .  tr  nr-  -t  '..  ,w  :.a ‘  ■  w.tn  tr.e  'iin.ate  scales  o:  hole  Models 

.■  '  i.M  ,  .  .-J  1  „rt  -  .t-‘  .  Sutse  sent  test  development  work  on  the 

■v  ,,  „•  ■  •  i-  . .te-.-a'  ..ys.s  and  a  pr.noiple  component  factor 

■  (  tit.,.:,  ms  teen  niiOti  J  to  identify  a  sobset 

:  *  si  ’’  .t>-.  -  tmt  les*  def.r.t  tr.e  factor  structure  of  the 

a  .  A.  t:  f.r.  J.rgs  :n:  t;  .,•»  analyses  corrooorate  trie  reJur.dancy 

la:  .  v  s.  t*.  of  tr.e  sca.e;  and  tentatively  suggest 

.!*  a  fa  ' t  s  .,v.  .t.  w.tn  .terns  may  permit  a  more  parsimonious 

i~r.,  ■  •  Amy  w.,rk  environment  constructs,  res  i its 
r  . .  s  j  i- A  a'.  m.e  n..t  been  suffic  ently  cross- validated, 
r-i  .  fn  .s»i.t  .r  .ade  1  focu,.  on  trie  relationships  between  the 
-  _  a .  -  •.  -e.'  fr -n  tr.e  -oncer fuu.  taxon, ,mv ,  and  a  comprehensive  set  of 

. ,  l  perf  ■  r.v.  *-•  rid  mre  objective  performance  .ndices. 

•  j  resents  tr.e  ■  ,rrelat.'r.  coefficients  between  tne  14  Aw’EQ 
■a.e  :.r<.-  vd  the  s«t  '‘f  performance  criteria  for  the  total  sample. 

►. -••ral  .r.terest.n g  findings  emerged.  First,  tne  largest  correlations 
f'r.i  b~*ween  ei.v.  ro.nmental  variables  and  typical  performance  meas- 
res ,  !  „_’a.ir.  the  Aru^-w.  ie  RArJ.  In  ternio  of  tne  number  of  signiii- 

■i'.t  -m.,  4c. 4*  of  t-c  correlation  coefficients  between  environmental 

".ai',-:.  ar.d  t/jimul  m-moires,  as  compared  witri  28.6%  of  the  correlations 
.-  n.tx.T.ai  criteria,  were  statistically  significant.  This  difference 
-irn>'t  lc  attributed  to  sampling  i-^rcr,  c.r.ce  differences  in  cample  sizes 
,.r  the  .’orre.aticr.al  valies  shown  m  Table  '5  were  relatively  minor. 

Jecorni,  generally  t ho  er  ironmental  dimensions  of  (a)  Perceived  Job 
mi  rtar.ee,  ,b;  Lisc.plme  practices,  (c;  Individual  Support,  and  (d)  the 
ward  System  tended  to  oe  Significantly  correlated  with  performance  on¬ 
us. a  the  t'ta’  sample.  In  contrast,  the  AWr’Q  scale  scores  on  (a) 
e,,.ur--'.  I’co  i  o  '  r.qui  pmeut ,  o;  Vorkload/Tiine  Availability,  \<i)  Physical 
or.-.i'.g  .  :-n>. i t  .oris ,  and  .  J;  Changes  m  Job  Proeedures/Equipment  were  not 
.gr.i  f  icar.tiy  associated  With  scores  on  the  performance  measures.  Al- 
nougr.  t”,e  magnitude  of  these  environment-performance  relationships  are 
uwr  tn m  t h  ).><:  pre/iOusly  reported  v.  i  tn  Army  field  test  data  from  Pro¬ 
s'  ■*  A  ,i'"  n  >-t  ai.  ,  ’  /eZ  >  f.a.rly  consistent  trends  have  been  ob- 
'  *  i'  v .  •  i  jatttrn  c  .  g\ .  f  i  -ant  eclat  unships  between 

.  ..Tgi'i-c  Tier,  ted  AW.v  sc.'nes  ar.d  performance  ratings. 

In.ra,  when  relat.  mumps  fcetwt w  typ.cai  performance  measures  arid 
r.v. i*  on  r.Vi.  fa<  tor;  w“r’“  e/amried,  t‘<«  of  the  correlations  between 
..nr-  r-r*  i<t*  a  linens,  ms  and  JH.  )jo  of  the  correlations  with  job-ori- 
r.  t •  *.i  fa,  t . > r  .  w<  r>-  sign*  f  i  ran  t  g,  related  tc  performance  ratings.  Further, 

.  .it.  lar  patten,  of  s  :gr. ;  f .  can*-  relationships  was  found  between  the  onvi- 
rcur.ta’  .'aruvci  and  nax.na.  performance  criteria,  specifically  bO>I  of 
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..era!’  uni.e-  .  ffecti  v  ‘ness.  However,  these  findings 
,t,i  ,  con  ten*  i  ,  because  t  larger  percentage  of  climate- 
tvu:  <;ob-or . ented  f'n-  ters  were  significantly  correlated 
I  *’  [  e  r:  .  r.  "<  .  .d  1  ce  . . 
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Typical  perfor~.ar.ce  Measures 


10  11  1^  l;  ii 


r  An" 

,reer: 

.01 

.02 

.CP 

• 

.06 

•  O 

• 

.09 

• 

.23 

.0" 

.04 

• 

.  1  *1 

e 

.14 

« 

.16 

• 

.14 

\  \ 

:  -u.:e  BARS 
r -per . . sc  rs , 

.Oi 

- .  05 

.06* 

.Ob 

• 

.15 

• 

.  1  1 

• 

.17 

• 

.12 

.01 

• 

.  1  1 

• 

.'3 

V 

.  14 

• 

■ '  3 

.09 

?-spe*.f.c  3A?3 

•  C' 

-.04 

.06 

.03 

07 

.04 

• 

.16 

.04 

0 

.03 

• 

.10 

.09* 

.O’ 

.05 

Beers 

:-3pec.f*c  B>.?S 

*  C' 

• 

-.06 

• 

.06 

.03 

.06 

.06 

• 

•  13 

• 

.  1  1 

-.01 

.03 

.06 

.03 

-  O'1 

•  C3 

;.-er. *  sers , 

‘‘.ax:  sal 

Performance 

Measures 

r4s-cr.  Test 

-  .03 

r, 

V' 

-.06 

-.01 

-.04 

.02 

-09*  - 

.02 

-.02 

-.06* 

• 

.06 

.04 

-.07 

.  1.4 

t  <\c.e4ge  Test 

-.05 

-.05 

-.03 

.03 

C 

•  03 

« 

-13  - 

.07 

-.06 

-.09* 

• 

.13 

• 

.  1  1 

-.03 

.01 

tc  t.L,  3;aLE$.  l*  nt-s.-rces,  2*»orkload,  3’Tr-ir.ir.g .  < "Physical  Vcrking  Conditions, 
c  .  • .  v  p£.'e,srt  6*Jcb  Relevsrt  Information,  ’'’■Parcel  lob  Icportance . 

*:r,  A 9 s . g '  zer.\ ,  9*vb&r.ges  m  Jot  Procedures,  lO“Hevard  Systec,  1 1  •Lisciplme , 
’C’lrd. .  .d-sl  S-;.y'..rt.  i 3-Job-Relsted  S-jpport.  14-Rc>  “cdels. 

*2:  rre  la  t  •.  :r.s  v'-tcr.  ere  significant  at  jg  <•  .05- 


Fourth,  cons. stent  relationships  we re  observed  between  environmental 
variables  and  the  typical  performance  measures,  specifically  the  Army-wide 
BARS,  regardless  of  whether  performance  was  evaluated  by  supervisors  or 
peers.  This  finding  indicates  the  existence  of  some  convergence  across 
-  >  p-u.»  of  pt-rf  omanoe.  criteria  with  respect  to  the  influence  of  environ¬ 
mental  fuctiri. 

Finally,  since  sampling  error  may  explain  many  of  the  observed 
differences  between  MOS,  only  a  few  potentially  meaningful  environment- 
perforiaanoe  relationships  wkI  be  discussed  for  the  various  Army  jobs. 

:n  the  Infantry  (lIB  MOf>)  and  the  Medical  Specialist  (91 A  MOS)  jobs,  soi- 
iLers1  performance  on  nearly  all  the  typical  measures  was  significantly 
■irrelated  with  ler-'eived  Job  Importance  (significant  _rs  ranged  from  .1? 
to  .50).  Significant  relationships  were  observed  between  Work  Assignment 
and  the  performance  of  Armor  Crewmen  C 1 913  MOS)  on  the  MOS-specnic  BARB 
uni  the  Hands-on  measure.  In  the  31C  MOS,  the  performance  of  radio- tele¬ 
type  operator.,  was  significantly  correlated  with  Job  Relevant  Authority 
>._r  -  .3'  obser.cl  with  Army-wide  BARS  for  supervisors).  Performance 
'f  me-himcs  3B  MOS)  on  the  ‘r...y-wide  BARS  (peers)  was  significantly 
r»*  la  tel  v_r  -  .50)  to  scores  on  ttie  Individual  Support  dimension. 

Although  these  specific  I10S  findings  suggest  potentially  interesting 
reial.onsiups  between  environmental  variables  a.vl  performance,  these  eor- 
i*.  nnc.  m-5  cased  >u  s  ibstnnt lal  ly  mailer  sample  uses  than  those 
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reportet  Table  1  and  cannot,  v.tnout  cross-validation,  be  assumed  to 
represent  stable  estimates  of  the  true  ec-  -relational  values. 

r*r  '•  . o  ■*  >»i  • 
vV»1v  &.J  V  I  *  O 


.3  resear  v.  I'xar^nea  correlations  between  14  scale  scores  on  an 

v  .  r  ‘  ,  -st  -onr.a  . re  an  i  nenoures  of  botn  typical  and  max  i - 
'crr.ance.  '::or  t>  this  applied  research  m  an  Army  setting,  m- 
-■  t  fir. dings  were  reported  m  the  empirical  literature  with  respect 
i  eusnips  between  organisational, environmental  variables  arid  per- 


r.a.  p- 
r.s.c 


■man. 


< Its  from  tn...  appl.ea  Army  resear  ;h  indicated  that  significant 
r-‘I*  t .. u.sh.ps  ex.ct  between  job-  or . ented  and  climate- related  environmental 
.  ,r  at  es  and  both  yit  performance  ratings  ^typical  measures)  and  more 
maxima'.,  oojtetive  enter  la-job  Knowledge  and  hands-on  tests.  Further, 
t’.eoe  fiT.d.ngs  suggest  that:  \l)  environmental  factors  have  their  strong¬ 
est  c. :  relat.ens  w.‘n  mere  t,.p  cal  performance  measures  such  as  Army-wide 
PARS,  _  .  il .  ma  tv-  or  .~nt » d  environmental  variables  have  a  larger  number  of 
c  gn.i.car.t  effects  on  maximal  performance  criteria  than  job- related  envi- 
rcnirenta.  d. nensions,  and  v 5 )  generally  such  job-oriented  environmental 
at l-*.-.  ,ac  Reso  iroeS/  Tv  ois/ Equ:  ament,  Physical  Working  Conditions,  and 
■■.anges  m  Jot  Proeed urea, 'Equipment  are  not  significantly  correlated  with 
tne  comprehensive  set  of  performance  criteria. 

Perhaps,  tne  weax  but  significant  correlations  observed  between  envi¬ 
ronmental  dimensions  and  performance  may  be  related  to:  (1;  a  lack  of 
sufficiently  constraining  or  facilitating  conditions  on  the  part  of  the 
env’i r  unmental  variables  themselves  or  (2 )  contextual  factors  such  .as 
rater*  adjistmg  th.eir  performance  evaluations  to  compensate  for  the  nega¬ 
tive,  posi tive  effects  of  specific  work  environments. 
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Initial  Stanaara i zaticn  of  an  Air  Force  organizational 
Assessment  Survey  Instrument* 


Lawrence  u.  Snort,  it  colonel,  OSaF 
Air  Force  human  kesources  Laboratory  (AEhRL) 

dames  K.  Lowe,  captain,  uSAF 
lbbbth  civil  Engineering  Squaaron 

Janice  h.  Hightower,  captain,  USAE 
Detachment  b,  Air  Force  Operational  Test  anu  Evaluation  center 

In  Is  tne  directorate  ot  kesearch  and  Analysis  ot  tne  Leaoership 
ano  Management  I'evelcpment  Center  (LKDl)  began  work  on  a  revision  ot  trie 
cnra'cy  data  qatnermg  instrument  useo  in  the  LhCt  consulting  process,  tne 
Crgmizat if nal  assessment  Package  (GAP).  The  purpose  ot  this  report  is  to 
discuss  tne  results  of  initial  standardization  research  performed  on  the 
second  generation  GAP,  now  called  the  Grganizationa  I  Assessment  burvey  ( GAS ) . 
Go*e  specifically,  the  report  deals  witri  cenving  tne  oAS  factor  structure 
ano  testing  each  obtained  factor's  internal  consistency  reliability. 

In  its  present  form,  the  GAP  survey  consist:  of  a  computer-scored 
r-'spcnse  Sheet  ano  a  lGit-item  (pu  attituumal  ano  ib  demographic)  booklet, 
responses  use  a  scale  of  one  to  seven,  witri  a  value  of  "1"  generally 
iiduatii.o  strong  oisagreenient  ur  dissatisfaction  with  the  question  or 
sute-'ent,  and  a  "7"  indicating  strong  agreement  or  satisfaction.  Through 
factor  analysis,  tne  Gb  attituoinal  items  are  combined  into  factors  as 
presented  m  Table  1. 

Table  I.  GAP  factors 


bK 1 1 1  Vari<  ty 

besireo  Repetitive  Easy  Tasks 

Task  Identuy 

Advancement/ Kecogni tion 

Task  Significance 

management-supervision 

aob  Feedback 

Supervisory  Coiidnunications  Glimate 

Pert  enhance  barritrs  anc  blofraoes 

Grganizationa  1  Gommuriications  Glimate 

NetJ  fcr  Enrichment  (Job  Gesirea) 

Korc  Group  Effectiveness 

uoo  >r,  *mance  Goals 

Jog  Satisfaction 

Pr  i  ue 

Job  Related  Training 

lack  characteristics 

General  Grganizationa  1  climate 

,.ork  Repetition 

Administration  of  the  survey  is  the  first  step  iri  the  consultation 
prunes.  Git  survey  is  given  to  a  stratified  random  sample  ot  the  orgamza- 
1 1 or  to  whirh  LfnL'C  has  been  1  nv i ten .  me  results  of  tne  survey  are  ar. 
important  feature  in  tne  assessment  ;f  t-.ss,  supervision,  climate,  one 
proauctivi  iri  an  organization.  Ine  results  are  nanaleo  in  a  confidential 


*A  1 1  work  ,id>  con.pl  (t  u  wh:k  authors  ,.,cre  i.ti.b^! 
Runaqemerit  Geveiopmuit  con  ~r,  '-e  search  on;  Analysis 


;itb^ i  n  of  the  lecutrsiiip  anu 
Analysis  In  recto* ate. 


K 


manner  Lelween  L^1L>C  ana  tne  client.  After  approximately  five  to  six  weexs 
tjr  analysis,  consultants  return  to  the  organization  to  provide  feeoDacK  of 
oata  to  coii.manoers  ana  supervisors. 

When  organizationa !  problems  are  encountereo,  a  consultant  ana  supervisor 
cevelcc  a  management  action  plan  oesigneo  to  resolve  tne  problem  at  tiiat 
level  of  tne  organization,  within  six  to  nine  months,  tne  consulting  t^am 
returns  to  i ^administer  the  survey  instrument  as  a  means  to  nelp  assess  trie 
impact  of  ire  consulting  process. 

]ne  aata  tron.  eacn  uAP  administration  effort  are  storea  in  a  cumulative 
Gata  case  currently  containing  over  zoo,uCU  recoros  for  research  purposes. 
Thosa  aata  are  aggregated  by  work  group  cooes  aeveiopea  for  tins  instrument. 
Tne  aata  may  be  recalled  by  demographics  such  as  personnel  category,  age, 
sex,  «ir  Furce  Specialty  loae,  pay  grade,  time  in  service,  ana  educational 
level . 

The  umF  was  developed  jointly  by  LhLC  ano  AfhkL  at  brooxs  AKb,  Texas 
(Heidrix  &  Halverson,  lu7ga;  197 ^b) .  foore  recently,  additional  standard¬ 
ization  work  has  been  cone  with  the  GAP.  Short  ana  Hamilton  (labl)  provided 
evioence  ot  the  Metor-by-tactor  reliability  of  tne  mstrunient  considering 
both  internal  consistency  and  test-retest  (stability)  aspects.  In  addition, 
y or t  and  ivilxerson  (Igbl)  provided  evioence  in  support  of  the  yroup  differ¬ 
ences  aspect  of  CAP  construct  validity.  Webster  (Ig&Z)  a'so  studied  con¬ 
struct  validity  of  tne  leadership  ana  organizational  climate  areas  of  the  GAP 
oy  using  a  modified  multi-trait,  mui t i -method  approach,  favorably  cornparing 
the  GA^  to  the  Surrey  of  Organizations  (Taylor  &  bowers,  1972) .  Finally,  the 
stability  ot  the  t’Mf  factor  structure  was  studied  across  selected  functional 
area  ana  demographic  groups  (Hightower  &  Short,  lStrz)  ana  across  intervals  of 
time  (Hightower  &  Short,  1 9b3 ) .  Tnese  studies  yieloea  a  slightly  different 
factor  structure  than  that  currently  m  use,  out  showed  the  revised  structure 
to  be  extremely  stable  across  all  comparison  groups.  These  studies  combined 
with  several  years  of  experience  using  tne  instrument  in  the  consulting 
process  pointed  out  ways  the  GAP  could  be  revised  to  enhance  the  process  ana 
the  accuracy  anc  precision  of  organizational  diagnoses. 

hetnoa 

over  view  and  subjects 

[he  subjects  of  the  GAP-uAS  data  gathering  came  from  three  operational 
bases  in  the  Continental  United  States.  Since  the  surveys  were  administered 
oy  LF.Dl  consultants,  the  test  bases  were  preselected  via  the  consultant's 
schedule.  Ihe  data  were  gathered  by  LhlH  consultants  both  as  part  of  the 
consulting  process  ana  as  an  effort  to  test  the  new  GAS  instrument.  Ihe 
subjects  responded  to  both  the  GAP  ana  the  GAS  ana  were  then  provided  the 
opportunity  tu  verbally  express  their  tnoughts  concerning  the  face  validity 
nr  t n i ■  GAS  instrument.  The  responses  rrom  each  person  on  the  two  surveys 
were  'inxcMi  «s inq  special  coding, 

A  total  or  l-.,y>  personnel  roox  the  back  to  back  OAP-OAS  surveys.  [he 
sample  consisted  or  oh.zSb  males  anu  14. bk-  females,  compared  to  the  i Air 
force  ratio  ot  bay  male  to  I  !y  female.  Aqes  ranged  from  17  to  bd,  although 


Sb*  were  younger  tnan  47.  The  sample  consisted  of  1j%  officer'..  7 ba 
enlistee,  7*  General  Schedule  (bb)  civilians,  ano  i*  waye  toara  (wb)  civil¬ 
ians.  The  Air  Force  off icer/enl istsn  ratio  is  17%  officer,  bu%  enlistee. 
Note  tnat  if  the  civilians  were  renioveo,  tne  orf icer/enl isteo  ratio  of  the 
l-S  sample  would  ue  1 4%  officer  to  bb%  er.listeo.  Fifty-four  percent  (bn*) 
nao  rr.aoe  either  one  or  two  PiS  moves,  anc  nao  been  on  at  least  one  unac¬ 
companied  P„S  tvur.  Over  box  of  tne  personnel  hao  daytime  worx  scheuules. 
Kac  a'ly,  7b*  of  tne  sample  were  whites,  la*  were  blacks  ^compared  to  the 
-ir  nrce's  lb*),  ana  14%  were  otners.  Over  /.'/x  nao  more  than  1^  years  in 
tne  rir  Force,  while  £\%  nao  less  tnan  c  years  or  total  service  time. 

I  TStr'un.e-i  tat  icn 


The  version  or  tne  UAS  usea  for  tne  present  stuay  containeo  104  attituc- 
inal  items  tor  test  purposes  ano  ten  Demographic  items  in  aaaition  to  those 
in  tne  OAP.  The  OmS  survey  items  were  selected  to  meet  two  criteria.  First, 
previous  factor  analyses  hao  shown  that  the  supervisory-relatea  factors  ana 
tne  climate  relatea  factors  on  tne  present  OAP  are  not  separate.  Therefore, 
one  criterion  was  to  test  new  items  to  create  separate  factors  wnile  elimi¬ 
nating  some  of  tne  seemingly  ambiguous  items  wnich  presently  lcao  with  the 
composite  supervisor-climate  factor.  Second,  consulting  experience  has 
snown  the  need  for  some  aoaitional  factors  not  represented  in  the  original 
OAF.  Induced  in  this  group  are  factors  pertainng  to  stress  management  aria 
intergroup  cooperation. 

Procedure 


Administration,  During  each  survey  aoniirn stration,  the  OAP  was  adminis¬ 
tered  prior  to  the  OAS  to  prevent  contamination  of  OAP  results.  After  com¬ 
pleting  ootn  surveys,  participants  were  askea  to  complete  a  short  question¬ 
naire  about  tne  OAS.  In  addition,  ob4  responaents  were  randomly  selecteo 
and  verbally  polled  to  see  if  tney  liaa  additional  comments  aDout  tne 
survey.  If  so,  those  comments  were  recoraea  on  tne  questionnaire. 

Factor  structure.  Derivation  of  the  i actors  was  accomplished  Dy  use  of 
a  principal  components  analysis  with  a  varimax  rotation  using  pairwise 
deletion.  For  facto*"  solutions,  the  "eigenvalue  greater  tnan  one"  criterion 
was  usea.  In  addition,  a  Scree-test  was  used  to  help  determine  the  optimum 
number  of  tactors  to  extract.  Following  ware,  Snyoer,  ana  wright  (lWb), 
the  Factored  homogeneous  Item  Dimensions  (FHIo)  criteria  were  usea  to  assign 
items  to  i actors.  Under  these  criteria,  it  was  required  that  all  items  in  a 
factor  have  high  loadings  (  +  .40  or  greater)  only  on  that  factor  ano  low 
loadings  (+  .bb  or  less)  on  all  other  factors  in  tne  matrix.  This  method  was 
a  useful  check  not  only  tor  item  homogeneity  but  also  for  item  oiscriminant 
validity.  An  audit *onal  requirement  imposed  was  that  there  be  an  absolute 
difference  of  at  'east  . lu  between  tne  primary  loading  ana  the  item's  highest 
secondary  leading.  Numeric  scores  for  negatively  woraea  items  were  reflexea 
befo'e  the  factor  mean  was  calculated  so  that  numerically  nigner  tacior 
responses  always  uidicateo  more  favorable  responses. 
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internal  eunsistency  Reliability.  Tne  meihoG  ot  choice  here  was 
tronoach's  alpha.  Generally,  the  niost  pooular  ot  the  internal  consistency 
’nethoGS,  alpn i  can  be  obtainea  from  a  single  survey  administration  ana  elimi¬ 
nates  the  inconsistency  ot  splitting  items.  Its  calculation  ,s  baseu  on  the 
njiTiber  or  iters  in  a  scale  or  factor  ana  tne  mean  mteriten,  cot  relation  for 
tnat  same  scale  or  factor.  Usually,  therefore,  as  the  average  interitem 
correlation  or  the  rammer  of  items  increases,  sc  aoes  the  value  of  alpha. 
These  procedures  must  be  Dalanceo,  however.  For  example,  mere  is  an  upper 
Douna  on  significant  increases  in  alpha  f’om  adding  items.  In  addition, 
acaing  items  tnat  reduce  the  mtentem  correlation  will  not  increase  alpna. 
it  snoula  also  De  noted  tnat  alpha  is  often  considered  the  lower  oouno  ot 
internal  consistency  reliaoility.  Thus,  alpha  may  generally  be  considered  a 
conservative  estimate  of  the  true  reliability  of  a  scale  or  factor  (Carmines 
&  teller,  lwo).  since  it  is  difficult  to  attach  significance  levels  to 
alpha,  a  more  direct  standard  of  comparison  was  useG.  For  purposes  of  this 
study,  alpha  coefficients  were  considered  acceptable  at  .bu  or  above,  good 
at  ,7o  or  above,  ana  high  if  .gu  or  above  (‘ware,  Davies-Avery ,  &  Stewart, 
1  78 ;  Carmines  &  Zeller,  1g7g;  Hendrix  &  Halverson,  b7Ga). 

-esults 


Factor  Structure 


The  initial  factor  analysis  included  Iua  attituainal  items  of  wmen  47 
were  new  items,  34  were  reworaed  versions  of  GAP  items,  14  were  nearly 
identically  worded  to  CAP  items,  ano  y  were  identical  to  GAP  items.  Each 
uem  was  included  to  help  measure  one  of  the  factors.  Demographic  items 
were  not  included  in  tne  factor  analysis. 

The  initial  factor  analysis  was  actually  two  factor  analysis  problems,  as 
the  SPSS  system  utilized  limited  the  number  of  items  per  run  to  lull.  The 
initial  analysis  extracted  lb  factors,  13  of  which  were  expected;  the  remain¬ 
ing  two  tartors  accounted  for  less  than  Z.bS  of  the  variance  and  had  no  items 
with  loadings  of  U.4U  or  higher,  based  upon  the  results  of  the  initial 
factor  analysis,  items  which  die  not  satisfy  tne  item-loading  criteria 
mentioned  in  i.ne  Procedure  section  of  the  report  were  elim.natea  from  the 
GAS.  Selecting  the  final  items  to  be  included  in  the  GAS  was  an  iterative 
process  to  ensure  that  each  item  loaded  uniquely  to  its  expected  factor  and 
to  insure  that  no  item  that  loaned  uniquely  onto  an  expected  factor  was  left 
off  the  Gas. 

Tne  final  factor  analysis  contained  only  items  which  satisfied  a  1 !  the 
i tom-loading  criteria,  insuring  item  "dimensionality."  This  reduced  the  has 
from  1g4  to  77  aaifuamal  items.  Five  items  from  the  GAP  which  dealt  with 
tne  Need  for  Job  Enrichment  were  adoeo  to  the  items  to  be  inciuoea  in  the 
UAS.  The  Need  tor  uob  Fnrichment  items  were  included  in  the  final  facto1" 
analysis  even  inouyh  they  hao  not  been  field-tested  as  part  ot  the  uAS 
instrument,  we  expect  these  i terns  to  continue  to  load  into  the  same  factor 
urge  they  are  included  in  the  GAS.  The  gAS  survey  contained  five  lombat 
rceaa 'r,ess  items  which,  even  though  they  locoed  strongly  into  one  factor, 
were  removed  from  the  OAS  survey  to  be  included  instead  within  LfiDG’s 
Tombat  Attitude  Survey.  Unly  11  of  the  final  77  attituainal  items  had  means 
above  b.b. 
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Tne  b  final  factors  are  1 1 s tea  in  laDle  c.  The  factors  accounted  for 
ot.b*  of  tne  variance.  we  nao  hopta  that  tne  items  relating  to  Recognition 
would  ludG  into  a  separate  factor;  however,  these  items  loaoeo  with  the 
Organizational  Climate  items.  Two  of  tne  extracted  factors,  Advancement  ana 
lnterg»'oup  cooperation,  consisteo  of  three  items  ana  tre  work  Conditions 
factor  consisted  of  two  items.  Ihe  factor  lojoings  ot  these  items  onto  tneir 
respective  factors  were  a1!  above  U.o4. 

Table  a.  uenvea  CAS  factors 


Job  Goals 

job  characteristics 

Tasx  Autonomy 

Training 

Worn  Support 

worx  conaitioris 

effective  Stress  management 


Supervisor 

Aavancement 

urgani zational  Climate 
Intergroup  Cooperation 
worx  Group  effectiveness 
Nee  a  for  job  enrichment 


Internal  Conslstenc y  Re  1 1 aoi 1 i ty 

The  77  att1 tudinal  items  whicn  loaded  onto  each  of  the  b  factors  were 
used  to  determine  Cronbach's  alpha  for  each  factor.  Values  of  Cronbach's 
alpha  ranged  from  .71  to  .yb,  witn  lu  of  tn_  b  factors  ^ving  alphas  ot  at 
least  .b4.  Tne  factors  with  three  or  fewer  items.  Advancement,  Intergrou^ 
'Doppratmn,  and  work  Conditions  had  alpha  values  of  .7b.  ./I,  ano  .bb, 
respecti vely .  hven  though  the  three  small  factors  had  fewer  terns  than 
desired,  tneir  reliability  scores  were  acceptable. 

a  final  Comment 

The  present  '"esearch,  then,  shows  the  Oao  factor  structure  tc  oe 
statistically  scur-a  and  generally  consistent  with  the  literature  as  well  as 
previous  experience  witn  similar  items  on  the  UAH.  The  factors  also  showed 
very  high  internal  consictency  reliability.  The  combination  ot  these  two 
findings  provides  initial  support  for  the  UA3  as  either  a  consulting  or 
survey  research  instrument.  Tne  CAS  seems  appropriate  to  replace  the  OAF  as 
it  provides  a  more  solid,  replicable  factor  structure  with  fewer  items  and 
factors  than  the  OAF. 
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PRODUCTIVITY  MEASUREMENT:  ISSUES  AND  CHALLENGES 

Paul  van  Rijn,  Ph.D. 

US  Army  Research  Institute1 
5001  Eisenhower  Avenue 
Alexandria,  Virginia  22333-5600 


'’'his  paper  describes  some  o-f  the  lessons  learned  during  the  Army 
research  Institute's  (ARI)  efforts  to  conduct  an  independent 
outside  evaluation  of  the  outcome  of  a  macro-level 
soc 1 otechn i c al  system  (STS)  intervention  at  a  large  Army 
maintenance  Depot-  This  particular  intervention  focused  on  a 
major  division  of  the  Depot  which  consisted  of  about  900 
workers,  mostly  Army  civilians  in  skilled  blue-collar 
occupations,  and  which  had  primary  responsibility  for  the  repair 
aid  overhaul  of  the  airframe  (as  opposed  to  the  engine  and 
transmission)  of  Army  helicopters. 


The  STS  analysis  and  the  development  of  the  design 
recommendations  were  conducted  by  two  outside  STS  experts 
working  close* y  with  a  team  of  12  Depot  volunteers  who  had  been 
carefully  selected  to  represent  both  the  different  levels  and 
occupations  within  the  ^enot.  This  phase  of  the  intervention 
lasted  nearly  one  year  and  involved:  (1)  the 
Depot  Philosophy  Statement,  (2)  the  explicit 
mission  of  the  Depot,  (3)  the  identification 
variances  (deviations  from  the  norm)  affecting  the  work  of  the 
Depot  and  their  controls,  (4)  an  analysis  of  the  social 
(people-related)  system  of  the  Depot,  and  finally,  (5)  the 
development  of  12  specific  r  commendations  for  change. 


developma  it  of  a 
articulation  of  the 
of  the  key 


Although  API’s  preliminary  efforts  to  evaluate  the  intervention 
outcome  were  initiated  from  the  beginning  of  the  STS  analysis, 
these  efforts  were  not  a  prominent  component  in  the  discussions 
of  the  STS  analysis  or  in  the  development  of  the 
recommendations.  Goals  and  expectations  for  increased 
productivity  and  enhanced  quality  of  working  life  were 
expressed,  but  seldom  in  operational  or-  measurable  terms.  API’s 
role  was  that  of  independent  outside  evaluate  . 


Toward  the  end  of  the  STS  analysis,  when  the  change 
recommendat l ons  had  '  een  developed,  this  researcher  first  became 
responsible  tor  API’s  evaluation  of  the  intervention.  Numerous 
archival  measures  had  been  identified  previously  as  potentia1 
indicators  of  increased  productivity  and  a  survey  instrument  had 


1The  views  expressed  in  this  paper  are  those  of  the  author  and 
do  not  necessarily  ref 1 ect  the  views  of  the  US  Army  Resea.  ch 
Institute  or  the  Department  of  the  Army. 
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been  developed  to  assess  ! ey  organizational  variables. 
Implementation  o f  the  recommend at 1 o. is  had  not  yet  been  begun  and 
the  conditions  -for  trading  the  intervention  before,  during,  and 
after  1  r..pl  ementat  1  on  seemed  optimal.  However,  appearances  were 
decept 1 ve- 


First,  it  became  apparent  that  the  archival  measures  identified 
were  much  more  complex  than  had  previously  been  assumed. 
Interpretations  of  the  simplest  measures,  such  as  "Number  of 
aircraft  produced,"  surfaced  a  long  list  of  questions.  Did  the 
count  include  all  aircraft  or  only  some  models0  Were 
crash  damaged  aircraft  with  their  special  requirements  included 
in  the  counts0  And  to  what  extent  did  the  overhau.  of  non-Army 
aircraft  and  special  projects  affect  the  measure7 

It  was  also  apparent  that,  with  few  exceptions,  the  "number  of 
aircraft  produced"  matched  exactly  the  number  that  had  been 
contracted  each  month.  If  30  aircraft  were  required,  30  were 
produced.  If  28,  then  28.  Over product l on  would  result  in  the 
aircraft  being  "carried  over"  into  the  next  month  or  would 
result  in  diverting  work  to  neglected  shopfioor  maintenance  or 
other  support  tasks.  If  production  was  behind  schedule,  the 
remedy  might  be  to  authorize  more  overtime  or  schedule  extra 
snifts  until  the  schedule  was  met.  This  often  resulted  in  a 
rlurry  of  activity  toward  tire  end  of  the  month. 


Other  measures  posed  similar  challenges,  not  the  least  of  which 
was  that  for  every  measure  there  were  multiple  repotting 
systems.  The  printout  of  productivity  report  ti t) es  alone  was 
over  one-half  inch  thick.  Nearly  everything  was  counted  and 
logged  into  the  highly  automated  reporting  systems.  There  were 
no  simple  ways  for  sorting  through  these  productivity  reports  to 
determine  which  reports  woulo  be  most  useful  for  meaningful 
evaluation  of  this  intervention.  There  was  no  single  expert  who 
could  advise  on  the  utility  of  each  report,  many  of  which  were 
produced  solely  to  comply  with  reporting  requirements  and 
formats  imposed  from  outside  the  Depot. 


Due  to  the  multiple  reporting  systems,  it  was  not  unusual  to 
find  a  measure  as  non-complex  as  "sick  leave  useage"  produce 
inconsistent  or  discrepant  results.  Often  the  data  would  be 
different  uecause  they  were  computed  differently.  For  example, 
on  one  report,  "sick  leave"  was  the  rate  per  employee  per  100 
work  hours,  while  on  another  report  it  was  the  rate  per  pay 
period,  or  80  work  hours.  Even  the  term  "monthly"  took  on 
different  meanings.  On  different  report  it  might  refer  to  the 


varying  number  of  calendar  days,  the  available  number  of 
workdays,  or  simply  two  pav  per  iocis  without  regard  to  the  number 
of  actual  workdays  involved.  Seldom  were  these  variations 
obvious  from  the  report  titles  and  painstaking  investigations 
were  required  to  resolve  even  the  most  obvious  discrepancies. 
These  individuals  responsible  for  mai ntai ni ng  the  various 
reports  were  often  not  aware  of  these  discrepancies,  since  they 
maintained  only  one  type  of  report.  And  although  they  were 
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highly  dedicated,  those  maintaining  the  printouts  were  not 
always  the  same  people  who  knew  the  precise  details  of  the 
derivation  and  meaning  of  the  numbers  being  reported. 

A  second  major  measurement  issue  that  emerged  was  the  level  of 
analysis.  It  has  already  been  suggested  that  global  measures, 
such  as  "number  of  aircraft  produced, "  are  often  difficult  to 
interpret  and  may  mask  real  productivity  gains  made  at  more 
molecular  levels.  Figure  1  shows  some  of  these  levels  and  the 
types  of  subtasks  that  are  required  to  process  a  helicopter 
airframe  through  its  18  overhaul  stations.  Over  60  relatively 
independent  work  centers  (not  all  are  shown)  are  involved  in 
maintenance  and  overhaul  of  the  helicopter  airframe.  Each  work 
center  has  its  function,  from  disassembly  to  electronics  repair, 
to  painting  and  flight  test. 

Reducing  the  specificity  of  measurement  to  this  level  might  be 
intuitively  attractive  were  it  not  for  the  tremendous  volume  cf 
data  that  would  somehow  need  to  be  collected,  analyzed,  and 
synthesized.  This  would  represent  not  only  be  a  collosal  task 
for  the  researcher,  but,  more  importantly,  it  would  place 
considerable  unexpected  strain  on  the  resources  of  the 
organization  to  duplicate  and  make  all  this  data  available. 

Besides  the  volume  of  data  that  would  be  generated  by  a  work 
center  level  of  analysis,  the  researcher  would  now  also  have  to 
face  the  challenge  of  comparing  and  aggragating  data  across  work 
centers  with  very  diverse  technical  processes  and  different 
outputs.  Except  for  a  few  measures,  such  as  sick  leave,  there 
were  few  productivity  indicators  that  could  be  aggragated 
meaningfully  across  work  centers.  In  addition,  it  raises  the 
question  of  how  a  researcher  can  assess  the  productivity  of  one 
work  center  when  the  productivity  of  that  work  center  depends 
significantly  on  the  productivity  of  one  or  more  other  work 
centers'7  The  paint  shop,  for  example,  cannot  be  expected  to 
demori-jtr ate  high  productivity,  if  it  is  not  provided  sufficient 
numbers  of  airframes  to  paint  or  if  those  airframes  all  arrive 
at  the  same  time. 

Other  major  issues  are  the  reliability  and  validity  of  the 
measures  themselves.  Even  some  of  the  more  promising  measures 
proved  on  closer  scrutiny  to  be  highly  variable  from  month  to 
month  and  were  ’  ikely  to  be  of  dubious  quality  for  research  or 
evaluation  purposes.  The  large  monthly  variations  could  often 
be  traced  to  unevenness  in  the  submissions  of  data  for  the 
report,  individual  differences  in  the  criteria  used  to  report 
the  data  (e.g..  What  is  a  reportable  defect7),  computer  system 
failures  or  software  upgrading,  undocumented  changes  in  the  way 
the  data  are  recorded  or  calculated,  re-establ i shment  of  the 
engineering  standards  or  norms  from  which  performance  efficiency 
ratios  are  calculated,  and  so  on. 

The  timing  of  the  evaluation  measurement  is  also  critical.  In 
an  intervention  as  large  and  complex  as  that  of  the  Army  Depot, 
it  was  difficult  to  specify  at  what  point  in  time  a  measure  is 
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before,  during,  or  after  implementation.  This  was  due  to  the 
fact  that  different  recommendations  were  implemented  at 
different  times  and  had  different  durations.  Training,  for 
example,  can  be  expected  to  result  in  an  early  decrease  in 
productivity  with  its  benefits  not  becoming  fully  realized  for 
years.  Other  recommendations,  however,  can  have  a  more  specific 
and  immediate  impact.  Finally,  it  can  be  argued  that 
implementation  started  the  moment  the  analysis  phase  was  begun 
—  long  before  the  recommendations  were  implemented  or  even 
devel oped . 

Another  important  measurement  issue  is  the  identification  and 
control  of  confounding  variables.  It  is  highly  unlikely  that 
during  an  intervention  of  any  magnitude  that  there  are  not  also 
other  factors  that  impact  on  the  measures  used  in  the 
evaluation.  For  example,  during  the  intervention  at  the  Army 
Depot,  the  Depot  went  into  a  massive  hiring  mode.  Does  this 
increase  productivity?  Or,  does  it  decrease  productivity  as 
productive  workers  have  to  divert  some  of  their  attention  to  the 
training  of  the  new  and  unskilled  hires?  Merely  identifying 
these  potential  confounding  factors  is  a  difficult  task,  but 
this  is  not  nearly  as  difficult  as  assessing  the  direction  and 
magnitude  of  the  effect,  or  of  determining  how  the  effect  varies 
over  time  and  from  work  center  to  work  center. 

Finally,  the  survey  instrument,  developed  to  study  Depot 
organi zat 1 onal  character 1 sti cs  that  might  contribute  to  an 
effective  intervention,  proved  to  be  insensitive  to  the  changes 
that  were  ultimately  implemented.  This  was  because  the  survey 
focused  largely  on  the  structure  and  work  processes  of  the 
Depot;  while  the  intervention,  as  a  whole,  did  not  significantly 
impact  on  these  areas.  Given  the  nature  of  the  organi zati on, 
major  or gan 1 z at  1 onal  or  process  changes  would  have  been  highly 
unlikely.  Consequently,  it  should  come  as  no  surprise  that 
there  were  virtually  no  changes  detected  over  the  14-months 
timeframe  in  which  the  survey  was  administered.  The  few 
questions  that  did  demonstrate  an  attitude  change  (some  over  20 
percentage  points)  tended  to  be  questions  that  related  directly 
to  the  philosophy  and  principles  of  the  STS  approach  to 
organizational  change. 


Lessons  Learned 

based  on  the  experiences  with  tms  particular  intervention,  a 
number  of  productivity  measurement  lessons  were  learned: 

1.  The  measurement  of  an  organization  is  likely  to  be  much  more 
complex  than  it  initially  appears. 

2.  Independent  outside  assessment  of  an  organi zational 
intervention  is  counterproductive.  First,  it  does  not  focus  the 
assessor  on  the  most  meaningful  measures  and  it  does  not  provide 
for  a  meaningful  reduction  of  the  multiple  measures  that  are 
aval  1  able. 


3.  Interpretat  1  on  of  the  numbers  alone,  without  full  knowledge 
of  the  process  from  which  they  derive  and  the  context  in  which 
they  occur  is  likely  to  be  highly  misleading. 


4.  The  assessor  must  work  very  closely  with  the  organization  — 
from  the  very  beginning  of  the  intervention  —  and  assist  the 
organization  in  identifying  the  measures  that  will  be  used  to 
assess  the  success  of  the  intervention.  This  requires  defining 
the  goals  and  expectations  of  the  organization  and  the  outcomes 
of  the  change  recommendations  in  terms  of  measurable  outputs. 

5.  To  the  extent  possible,  the  assessor  and  the  organization 
should  be  co-i nvesti gators  and  partners  in  the  intervention, 
mutually  learning  about  real  or gani z at l onal  concerns  and  both 
being  concerned  that  the  data  arc  meaningful  and  trustworthy  and 
that  the  inferences  derived  from  the  data  are  logically  sound. 

6.  To  the  extent  possiole,  the  different  stakeholders  in  the 
intervention  need  to  be  identified  early  and  their  criteria  for 
judging  the  pluses  and  minuses  resulting  from  the  intervention 
must  be  articulated. 

7.  Searching  for  unitary  cause  and  effect  relationships  is 
inef f f ecti ve.  The  assessor  and  organization  must  work  together 
to  clarify  the  major  interdependencies  and  contexts  in  which 
measurement  occurs. 

8.  The  measurement  aspects  of  an  intervention  need  to  be  an 
integral  part  of  every  phase  of  the  intervention.  To  the  extent 
that  positive  outcomes  can  be  identified  early,  the  momentum  of 
the  intervention  is  likely  to  be  sustained.  It  may  be  advisable 
to  deliberately  include  in  an  intervention  some  recommendations 
that  are  likely  to  have  an  early  demonstrable  payoff.  The  more 
these  payoffs  can  be  expressed  in  terms  of  "hard"  dollars  saved 
or  productivity  gains,  the  more  likely  the  intervention  will 
continue  to  receive  the  top-management  support  it  needs  to 
remain  resourced. 

9.  Finally,  resources  required  to  c ol 1 aborat i vel y  assess  an 
intervention  are  not  trivial  but  are  an  essential  and  important 
investment,  not  just  for  the  evaluation  of  the  particular 
intervention,  but  also  for  the  continued  monitoring  of  an 

or gan l z at i on ’ s  productivity  and  quality  of  working  life. 


A  MORALE  AND  MISSION  RELATING  STRATEGY 


Raymond  0.  Waldkoetter 
U.S,  Army  Soldier  Support  Institute 
Fort  Benjamin  Harrison,  Indiana  46216-5060 


The  Mission  Area  Analysis  (MAA)  assesses  the  long-range  capability  of  a 
programed  force  to  perform  required  combat  tasks.  The  analysis  is  designed  to 
discover  task  deficiencies  and  correct  them  with  changes  or  solutions  in 
doctrine,  organization,  training,  and  materiel.  The  MAA  process  also  provides 
a  basis  for  applying  advanced  technology  to  future  Army  operations  with  the 
inherent  aim  of  increasing  combat  effectiveness.  The  MAA  is  performed  by  a 
study  group  at  the  Training  and  Doctrine  Command  (TRADOC,  1985)  Centers  or 
schools.  There  are  13  MAAs,  projected  to  be  conducted  in  groups  of  three  once 
every  four  vears,  and  are  usually  initiated  prior  to  the  research,  develop¬ 
ment,  and  acquisition  cycle.  Throughout  anMAA's  analytic  process,  the 
soldier  must  be  considered  as  an  integral  system.  Deficiencies  in  combat 
effectiveness  must  be  remedied  either  by  improving  soldier  performance, 
directly  or  by  changing  doctrine,  organization,  training,  and/or  materiel.  In 
any  case,  the  soldier  is  the  key  element  in  the  combat  equation. 


The  individual  soldier  must  be  represented  as  a  "combat  system" 
integrating  TRADOC  combat  development  activities,  relating  to  improving 
individual  performance  with  human  technologies  or  equipment  that  will  enhance 
individual  capabilities  during  combat  (Wei sz,  1980).  Soldier  factor  issues 
are  identified  during  the  MAA  process  to  advise  or  assist  proponent  TRADOC 
combat  developers  in  solving  problems  related  to  soldier  motivation, 
capability,  and  performance.  Direct  assistance  anu  advice  are  given  to 
proponents  so  that  the  focus  is  on  consideration  of  so^ier  factors  as  the 
means  to  enhance  individual  and  unit  performance.  The  Soldier  Support  Center 
(SSC)  works  with  proponents  in  developing  MAAs  and  study  advisory  follow- 
through  to  assure  soldier  factor  consideration  is  also  carried  over  into  the 
materiel  acquisition  process  (MART* 


The  purpose  of  soldier  factor  consideration  is  to:  insure  that 
potentially  critical  soluier  factors  are  considered  in  the  MAA  process; 
provide  a  guide  for  considering  soldier  factors  in  the  MAA  process;  and 
promote  efforts  to  assess  the  impact  of  soldier  factors  on  MAA  deficiency 
solutions.  Study  efforts  in  the  MAA  process  should  specifically  include 
consideration  of  soldier  factors.  After-the-f act  consideration  of  soldier 
factors  is  largely  inefficient.  The  individual  soldier  should  not  be  regarded 
as  an  add-on  to  a  materiel  system.  Soldiers  must  have  a  defined  role  within 
the  mission  area.  Serious  consideration  of  soldier  factors  can  reveal  ways  to 
avoid  substantial  degradation  and  enhance  ability  to  perform  the  mission. 


The  views  and  opinions  expressed  :n  this  paper  are  those  of  the  author  and 
should  not  be  taker,  as  an  official  oc-licy  of  the  Department  of  the  Army. 


METHOD 


Soldi®-  factor  issues  become  critical  and  must  be  remedied  to  the  extent 
that  thpy  detract  from  the  soldier's  job  and  combat  performance.  Soldier 
factors  affecting  morale  and  mission  performance  must  be  defined  in  some 
scheme  to  begin  to  understand  how  they  may  affect  duty  or  combat. 

Historically,  commanders  have  relied  on  the  human/soldier  dimension  or  "moral 
force"  to  decide  conflicts  when  their  forces  were  equivalent  in  other  respects 
(Zais,  1985).  Knowing  relationships  probably  exist  between  the  global  notion 
of  morale  and  mission  success  has  not  led  to  any  readily  reducible  process  to 
confirm  these  relationships.  Since  factors  related  to  morale  and  mission 
success  are  so  numerous,  it  seems  best  to  devise  a  logical  scheme  to  analyze 
"some,"  appearing  to  have  noticeable  affects. 

Soldier  factors,  then,  are  defined  in  this  paper  tc  be  such  behavioral 
determinants  as  cohesion,  stress,  values  or  ethics,  mission  (sense  of),  and 
performance  (sustained).  Studied  in  the  MAA  context,  these  soldier  factors 
are  also  seen  as  being  affected  by  three  primary  determinants  -  systems/ 
weapons,  individual/group  characteri sties ,  and  leader/management  actions  - 
adding  to  the  comprehensive  soldier  factor  affects.  Morale  as  a  collective 
siate-of-mind  can  oe  identified  by  the  interaction  of  these  soldier  factors 
when  producing  a  positive  attitude  expressing  motivation  and  satisfaction 
(Motowidlo,  Dowell,  Hopp,  Borman,  Johnson  &  Dunnette,  1976).  Thus,  the  fully 
adequate  consideration  of  soldier  factors  also  conveys  the  basis  for 
understanding  and  assessing  the  level  of  unit  soldier  factors  as  morale. 

Soldier  performance  is  affected  to  a  lesser  or  greater  extent  as  soldier 
factors  are  used  as  corrective  actions.  Materiel  has  given  limits,  (i.e., 
tensile  strength  and  payload),  and  consideration  must  be  given  to  reliability, 
availability,  and  maintenance.  Soldiers  also  have  their  given  limits,  (i.e., 
strength,  endurance,  and  morale);  and  again,  consideration  must  be  given  to 
"responsibi 1 ity,  availability,  and  maintenance"  (fitness  and  readiness).  A 
thorough  consideration  of  soldier  factors  will  likely  guide  the  planning  for 
and  improvement  of  combat  performance  beyond  the  MAA  process.  The  concept  of 
soldier  factors  -  cohesion,  stress,  values  or  ethics,  sense  of  mission,  and 
sustained  performance  -  may  be  developed  under  the  primary  determinants  with 
descriptive  action-task  (AT)  .tatemerts  suggested  for  each  factor  to  improve 
mission  analysis  and  corrective  action  design. 

Each  proposed  corrective  action  or  solution  to  an  identified  task 
deficiency  is  derived  from  concepts  related  to  doctrine,  training, 
organization,  and/or  materiel.  A  recommended  solution  following  the  soldier 
factor  review  will  succeed  only  to  the  degree  that  it  remains  compatible  with 
other  projected  changes  and  is  supported  within  the  given  resource 
constraints.  The  perfect  solution  is  an  elusive  goal  and  the  best,  alternative 
may  demand  tradeoffs  m  deciding  which  soviet  factors  should  have  the  most 


attention  in  obtaining  a  solution.  As  the  ^AA  proponent  task  force  or 
assisting  soldier  factor  analyst  examines  the  interface  between  solutions  and 
soldier  factors,  the  following  procedures  serve  to  implement  consideration  of 
soldier  factors  in  the  MAA  process.  Cons ideration  of  the  soldier  factors  has 
to  identify  and  determine  for  proposed  solutions:  action-task  (AT)  statements 
that  will  affect  soldier  combat  effectiveness;  task  conditions  and  standards; 
the  combination  of  soldier  factor  considerations  that  can  most  likely  affect 
task  and  combat  success;  any  expected  changes  ir,  soldier  factors  related  to 
proposed  solutions;  and,  essential  elements  of  analysis  and  measures  of 
effectiveness  for  soldier  factors.  The  probable  improvement  in  critical  tasks 
is  determined  in  a  proposed  solution  by  considering  the  extent  to  which  those 
negatively  contributing  factors  can  be  reduced  or  countered,  and  positively 
contributing  factors  can  be  augmented  and  supported  for  corrective  action. 

The  total  impact  of  s  J i er  factors  is  evaluated  and  determined  through  the: 
impact  of  the  proposed  solutions  on  soldier  factors;  impact  of  the  soldier 
factors  on  Droposed  solutions;  soldier  factor  initiatives,  constraints,  and 
limitations  in  proposed  solutions;  and,  acceptability  and  feasibility  of 
implementing  the  proposed  solutions. 

RESULTS  AND  DISCUSSION 


The  analysis  proceeds  from  evaluating  the  impact  of  proposed  solutions  on 
the  most  critical  soldier  factors  to  questioning  how  aspects  of  proposed 
solutions  will  affect  one  or  more  of  the  soldier  factors  and  vice  versa.  For 
example,  will  a  proposed  solution  "require  leaders  to  Increase  training  to 
cope  with  increased  intensity  of  batfe  (stress)?11  Action- task  (AT) 
statements  like  this  are  used  to  judge  the  affordability  and  supportabi 1 ity  of 
a  solution.  The  AT  statements  can  be  used  as  operational  task  statements  and 
prioritized  in  terms  of  importance  for  the  proponent  mission.  Priority  for 
mission  accomplishment  can  be  indicated  by  having  a  subject-matter  panel  judge 


whether  they  disagree  or  agree  with  the  level  of  relevance  of  an  action  toward 
solving  mission  area  needs"  Then  any  deficiencies  associated  with  the 


prioritized  action  can  be  identified  and  remedied  by  the  most  appropriate 
types  of  solution.  A  second  way  the  proponent  may  utilize  AT  statements  is  by 
prioritizing  actions  most  to  least  pertinent.  Thirdly,  if  a  deficiency  would 
occur  in  any  primary  factor  set,  judges  or  raters  can  indicate  to  what  degree 
it  would  increase  the  probability  of  inadequate  performance.  As  corrective 
actions/solutions  are  reviewed  in  relation  to  selected  soldier  factor  actions 
or  issues,  then  the  proposed  solutions  for  doctrine,  training,  organisation, 
and/or  materiel  will  be  defined  and  prescribeo  accordingly.  If  a  solution  is 
constrained  by  any  soldier  factors  it  may  have  to  be  altered. 


Although  numerous  AT  statements  can  be  generated,  in  most  cases  20  or  less 
will  suffice  if  an  in-depth  review  is  conducted.  Some  105  AT  statements  were 
generated  with  21  for  each  factor  and  seven  for  each  factor  under  a  given 
primary  determinant.  Any  variation  of  AT  statements  is  possible  depending  on 
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the  mission  area  concerns. 

Table  1  is  presented  as  a 

working  e* ample  using  an 

analytical  scheme  based  on 

occupational-task  ana^sis 

procedures. 

TASi.f  1 

Sunup  f  At  TOR  CONS  I OCRAT  IONS 

SyS terns /heapoos 

(Parameters/capabi  1 1 1  ies  O’  Systems' 

Indi . idua'  'Group  Characteristics 
(Physical  health,  skilled  task 

Leader/Management  Actions 
(Refined  mix  of  goals,  behaviors,  power , 

•we-Jt'ons,  no*  to  employ,  aoj  to 

performance,  skilled  *nterpersonal 

„nd  styV  used  to  insure  effective'  ,om- 

k* 

exoe  l  tr^m  adequate  ^se  and 

perf ormance,  mttgr  ty/mental  h  alth; 

bat  operations  and  productive  effort.) 

e*  t*u  tS.  ' 

group  goals,  sxills,  endurance,  and 

> 

m 

ef  f lcienc  ' 

A.  Concur  (Sens*  of  belong. ng,  fpeling  a  pars  cf  something,  snar'ng  of  prob'ems, 
leaders  to  accomplish  any  purpose  resolved.) 


1.  Assure  system/weapur.  training 
is  related  to  suitable  skill 
lev*  1 . 


Cstnute  whether  personnel  can 
function  in  more  than  one  Ml  job. 


or  bonding  together  of  soidiers  and 


Communicate  clear  leader  goals 
and  objectives  for  survival. 


determine  operational  indicators  ?.  Monitor  impact  of  probable  pr-son-  2. 


to  improve  rombat  assignment. 

3.  Analyte  need  to  coach  or  add 
no**  mil  for  syste n/weaoon 
team. 

c  Identify  eipectPd  operat ional 
measures  to  sustain  efficiency. 


nel  actions  on  trcop  confidence. 

3.  Identity  special  skills  to  cope  3. 
w‘th  unusual  problems  3' 

s  ituat  ions. 

4.  Develop  attitudes  to  support  troop  A. 
loyalty. 


S.  Specify  most  nelpfu'  actions  to  b.  Suppoit  jon  satisfaction  interests  5. 

support  operator/teans.  in  preparing  personnel  assignments. 

fi,  Pruject  cntnal  time  allocation  t  define  duties  to  take  advantage  of  6. 

for  major  system'wejpon  available  skills, 

f  unc  t  ion  > 

7  Assess  Strengths  and  constraints  7.  Becognue  examples  of  espr  t  and  7. 

in  achieving  orufirient  sacr  'ice  in  battle, 

operations. 


Set  p.  opt'  examples  in  rombat 
exercises  and  combat. 

Initiate  motivating  rather  than 
coercive  ./ M  actions. 


Review  and  apply  cohesive  practices 
and  information. 

Take  part  in  a  cross  section  af 
activities  to  help  off’cial  direction. 

Supervise  and  participate  l*  expected 
standards  of  excellence. 


Demonstrate  reliability  and 
compassion  in  crises. 


Stress  (Sea.tiun  if  the  m i od/ho  )y  to  extreme  demand*  with  sensations  of  tension  and  anxiety,  and  which  m  >t  be  managed  to 
a-oid  degraded  performance.) 


1  tv.fiir,  *  he  adeouiry  of  cnmmjni- 
cations  for  system/weapon  select- 
*u  to  be  deployed 


Anal. 7*  uom'iat  exposure  levels 
and  parity  among  troops. 


1.  Define  procedures  to  maoage  stress 
continually  in  act  ion. 


Essentially,  the  impact  of  proposed  solutions  on  soldier  factors  is  found  by 
inoicating  which  factors  are  expected  to  be  affected,  with  some  idea  of  the 
magnitude  of  consequences  anticipated.  It  may  be  adequate  to  merely  identify 
the  soldier  factors  affected  and  indicate  the  order  of  severity.  A  more 
detailed  interrogati ve  analysis  may  be  desirable  to  define  just  what,  how,  and 
why  the  factors  are  posing  as  unresolved  and  critical  soldier  factor  issues. 
Tfie  impact  of  soldier  factors  on  proposed  solutions  can  b°  approached  by 
identifying  those  factors  which  can  alter  constraints  or  limitations  in 
solutions.  These  factors,  if  augmented  or  supported,  can  act  as  a  force 
multiplier"  for  continuous  combat  operations. 


Conducting  a  productive  consideration  of  soldier  factors  engages  as  much 
detailed  analysis  as  a  proponent  can  afford.  An  iterative  consensus  is 
recommended  to  relate  the  magnitude  of  soldier  factor  and  AT  statement 
judgments  to  primary  determinants  and  mission  success.  Numerous  questions  are 
required  in  regard  to  mission  areas  to  assess  the  importance  of  soldier 
factors  and  the  interaction  among  such  factors.  Each  proponent  will  have  to 
deciae  on  the  scope  and  level  of  resolution  needed  to  deal  with  soldier  factor 
.ssues  in  the  given  mission  area.  Corrective  actions  and  solutions  for  task 
deficiencies  require  detailed  analysis  to  audit  their  development  from  task 
deficiency  priority  to  type  of  solution  (i.e.,  training).  A  speculative 
scenario  will  allow  the  analyst  to  relate  the  probability  of  mission  success 
to  morale  by  progressively  assessing  a  primary  determinant,  soldier  factors, 
and  AT  statements,  as  these  indicate  a  need  for  corrective  action. 

The  analyst  or  subject-matter  panel  can  select  one  of  the  primary 
determinants  for  guiding  soldier  factor  analysis  (systems/weapons,  individual/ 
qroup  characteristics,  leader/management  Actions).  A  value  or  rank  of  import¬ 
ance  may  be  assigned  in  comparison  with  the  other  determinants,  e.g.,  J.,  2,  or 
_3.  Next,  the  analysis  process  can  estimate  the  magnitude  or  critical ity  Tor 
each  of  the  five  soldier  factors  toward  assuring  adequate  combat  effectiveness 
under  the  given  determinant  and  threat  scenario.  If  the  five  factors  would 
not  appear  "critical  to  adequacy'1  to  the  analyst  or  panel,  the  scale  magnitude 
would  tend  to  he  at  the  lower  scale  values.  That  is  on  a  scale  of  1  to  5,  the 
relative  priority  of  factors  would  tend  to  vary  from  J.  to  3.  Now,  The  analyst 
or  panel  can  select  the  AT  statements  considered  most  useful  or  needed  to 
favorably  modify  the  five  soldier  factors  by  augmenting  or  enhancing  perform¬ 
ance  and  related  perception.  Additional  or  new  AT  statements  or  items  can  be 
constructed  as  may  be  necessary.  More  detailed  AT  items  might  be  prepared 
under  each  chosen  item  to  bring  about  the  consideration  or  desired  improvement 
through  one  or  more  corrective  actions  or  soldier  factor  solutions.  Next,  the 
magnitude  or  criticality  would  be  estimated  toward  enabling  the  major  AT  item 
or  subitem  to  be  performed,  i.e.,  by  rating  on  the  probability  of  inadequate 
performance  on  a  scale  of  1  to  b  (low  to  high). 

The  final  phase  of  analyc_'s  is  to  derive  a  combined  magnitude  of  the  three 
levels  -  primary  determinant,  soldier  factor,  and  AT  item/subitem  -  to 
estimate  the  criticality  expected  for  some  composite  probability  of  inadequate 
performance  (Stark  &  Waldkoetter,  1985),  specifying  needed  change  or  solution 
for  mission  success.  The  three-level  estimate  of  criticality  and 
multiplicative/additive  combining  of  each  level  can  yield  a  conditional/ 
notional  index  of  priority  with  analyst/panel  consensus  for  each  of  the 
primary  do+erminants.  As  the  most  simplistic  example  it  could  suffice  to  base 
a  conditional  analysis  on  a  single  primary  determinant  rather  than  all  three. 

1 f  Leader/Management  Actions  was  ranked  highest  for  a  value  of  3,  the  highest 
rated  soldier  factor  with  a  value  of  5  and  one  AT  item  alone  rated  for  a  value 
of  5  a  product  of  75  is  obtained.  Then,  if  the  other  remaining  ranked  factors 


(1  to  4)  have  only  one  AT  item  each  rated  with  values  of  _5,  a  sum  of  50  would 
result  add  with  the  75  for  a  total  of  125.  The  lowest  possible  overall 
total  is  15  following  such  a  forced  solution.  A  scale  mid-point  of  70  and 
above  may  be  arbitrarily  given  as  the  hypothetical  area  in  which  serious 
indications  have  morale  implications  and  are  noted  for  the  probability  of 
inadequate  performance,  further,  if  Leader/Manaqement  Actions  in  Table  ^  was 
considered  of  highest  combat  importance,  the  Cohesion  factor  was  ranked 
highest  for  combat  effectiveness  (criticality),  and  only  one  AT  item,  1. 
Communicate  clear  leader  goals  and  objective,  for  survival,  was  given  the 
Tnghest  probabTTTty  for  inadequate  performance,  then  the  analysis  process 
would  have  yielded  an  indication  of  trouble  'n  need  of  some  corrective 
action.  That  action  being  to  make  certain  the  communication  task  was  mandated 
in  doctrine,  training,  organization,  and/or  materiel  to  build  cohesive 
performance  utilizing  leader  action  to  insure  the  expected  resolution. 

This  pre-R&D  overview  offers  a  procedural  basis  <or  soldier  factor  (SF) 
consideration  that  will  logically  and  systematically  identify  potential  SF 
restraints  and  needs  related  to  soldier  impact  on  combat  effectiveness. 
Analytical  consensus  and  review  should  strongly  influence  this  procedural 
model  as  a  notional  strategy.  In  this  approach  it  seems  logical  to  infer  that 
if  leader,  cohesion,  and  communication  actions  are  expected  to  be  impaired  by 
critically  inadequate  performance,  then  moraie  and  mission  success  are 
inexorably  related.  Some  principle  of  proportional ity  must  apply  in  that  any 
task  action  for  either  demands  a  complementary  action  for  the  other  to  achieve 
a  dynamic  balance. 
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COGENT  is  a  microcomputer  cased  academic  management  tool  that 
offers  a  workable,  reliable,  and  efficient  solution  to  the 
implementation  and  administration  of  criterion  referencea 
testing.  It  provides  a  means  of  generating  printed  tests  and 
evaluating  student  responses  to  not  only  each  test  but  also  to 
Jistin^t  content  areas  within  each  test.  These  content  areas  can 
be  related  to  the  objectives  within  the  test.  COGENT  provides 
detailed  test  results  which  indicate  mastery  of  the  course 
objectives  as  well  as  student  tracking  and  record  keeping. 


COGENT  has  a  built  in  test  authoring  system  that  enables  a 
person  to  easily  enter  test  items  into  an  item  bank.  An  editing 
function  allows  the  test  author  to  nake  charges  to  an  item, 
delete  an  item,  insert  an  item,  or  move  a  r  item  bo  another 
position  w'thin  the  item  bank.  Each  test  can  be  subdivided  into 
up  to  10  content  areas.  Each  content  area  contains  all  of  the 
test  items  for  one  discrete  concept  (terminal  objective, 
enabling  objective,  lesson  topic,  etc).  In  addition,  the  content 
area  parameters,  which  are  used  in  the  orinting  of  the  test  and 
in  determining  if  criterion  was  met  for  each  content  area,  are 
easily  entered  into  the  system  with  the  aut.icr  program.  Ut.  to 
100  test  items  may  be  entered  into  each  distinct  content  area. 


A  test  printing  function  enables  the  operator  to  easily  print 
a  hard  copy  test.  Pre-de fined  parameters  determine  how  many 
items  are  selected  from  each  of  the  content  area  item  banks.  A 
maximum  of  100  items  can  be  selected  from  the  ccnten.  area  item 
banks  and  printed.  The  test  items  can  be  randomly  selected  by 
the  computer  or  manually  selected  by  the  operator.  The  test 
items  can  be  printed  i n  random  or  sequential  order.  The  choices 
for  a c h  cf  the  test  items  can  also  be  printed  in  random  or 
sequential  order.  If  computer  selected  and  random  printing  of 
test  items  and  choices  is  chosen  when  printing  new  tests  each 
test  printed  by  COGENT  will  be  unique  from  any  other  and  will 
assist  in  minimizing  the  possibility  of  test  compromise. 
M  u  1  i  p  1  ?  choice  and  true  / false  items  are  accommodated  in  the 
f  e s  t  item  banks. 


COGENT  will  significantly  decrease  the  number  of  manhours 
presently  required  to  prepare,  revise,  tyoe  a  r.  d  print  new 
examinations.  New  variations  of  a  test  can  be  generated  as  often 
as  desired. 
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7 n e  to'ts  are  scored  rron  optically  scanned  answer  sheets. 

1  It ‘i?  analyzes  not  only  the  entire  test  but  also  individual 
c  ntent  areas  to  determine  if  each  student  mastered  the  subject 
matter  contained  in  the  content  areas.  Test  results  are  printed 
ter  the  last  answer  sheet  has  been  scanned.  Tne  reoort  lists 
ail  c  *'  tne  students  who  took  the  test  along  with  their 
-i  i  s  o  i  i  a  t  e  d  overall  test  scores.  It  also  includes  for  each 
student  a  score  for  each  of  the  content  areas  j  n  d  a  notation 
'.  nJicatng  any  failed  content  areas.  Individualized  prescriptive 
assignments  can  also  be  printed  for  those  students  requiring 
remediation  and  retesting.  Customized  remedial  tests  for  each  of 
the  students  requiring  retesting  can  also  be  printed.  These 
tests  will  be  made  up  of  only  those  items  from  the  failed 
i o  n  t  e  n  t  area. 


C03EMT  allows  students  to  be  transferred  from  one  class  to 
another  with  all  test  data  remaining  intact.  Students  can  also 
be  put  into  a  ’hold1  status  and  then  transferred  to  an  active 
class.  In  addition,  students  can  be  disenrolled  from  school.  Any 
student  that  is  transferred,  dropped  or  put  into  a  hold  status 
requires  a  corresponding  code  to  be  entered  so  that  reports  can 
be  printed  listing  any  and  all  students  by  any  desired  code. 

oCJE.NT  provides  the  following  reports: 

"Class  Roster  -  provides  an  alphabetical  listing  of  the  students 
including  name,  rate,  SSN,  and  other  pertinent  student  data. 

•ituder,  t  Cumulative  Summary  -  provides  a  listing  of  all  the 
students  along  with  their  overall  score  on  each  test.  Included 
is  a  notation  indicating  the  students’  pass/remedial  status  for 
each  test.  This  notation  indicates  the  number  of  times  each 
student  was  retested  in  order  to  achieve  criterion  on  all  of  the 
content  areas  for  each  test.  This  report  also  incluoes  each 
student’s  cumulative  average  and  the  class  emulative  average. 

’content  Area  Performance  Summary  -  prints  a  detailed  analysis 
of  each  student's  performance  on  each  test.  It  includes  the 
overall  test  score,  the  score  for  each  content  area  the  first 
tmt  the  test  is  given  and  any  /'all  scores  on  subsequent  remedial 
tests.  It  lists  the  class  performance  statistics  such  as:  number 
of  students  taking  the  test;  class  test  average;  number  of 
c  ‘  u.lerivS  passing  all  content  areas  on  first  attempt;  number  of 
student  requiring  remediation  after  first  attempt;  etc. 


*  3  t  u  d  e  n  t  Ranking  -  provides  a  printout  of  the  class  with  the 
students  r  a  n  k  °  d  high  to  low  based  on  each  student's  cumulative 
average.  In  addition,  those  students  meeting  honor  graduate 
requirements  can  be  recognized  with  an  appropriate  notation. 


•Academic  Review  3 1  a  r  J  Worksheet  -  crint-'d  for  those  students 
who  experience  acader i c  difficulties.  It  is  used  by  the  Academic 
Review  3  o  a  - d  in  determining  academic  action. 

•Student  Memo  -  f  orn  printed  for  students  who  are  assigned 
remedial  study 'testing.  Ic  l:sts  the  time  and  location  where  the 
remedial  study/testing  will  take  place  and  is  used  to  assist  in 
■r. anaging  me  remedial  study  and  retesting  pregram. 

•List  of  Transfers/Drops  -  provides  a  printout  of  the  students 
wrio  were  either  traaifer.  ed  to  another  class,  cr  disenrol  ]  ed 
from  school.  It  also  includes  the  transfer. 'drop  code  and  tne 
date  the  action  took  place.  This  report  can  be  printed  for  one 
student  ,  all  students,  any  specified  transfer /dr op  code,  or  all 
transfer/drops  codes. 

•Item  Analys. j  -  provides  a  D  4  V  item  analysis  and  also  a 
response  count  analysis  which  are  used  to  determine  the 
effectiveness  of  the  test  items.  It  also  includes  the  mean, 
median,  and  standard  deviation  for  the  test. 


Some  of  tne  benefits  that  can  be  realized  by  COGENT  are: 

1.  Ability  to  identify  specific  problem  areas  within  a  test: 

COGENT  grades  the  answer  sheets  and  evaluates  each  test  by 
content  area  (oojeitives).  This  enables  the  proctor/instructor 
to  identify  problem  areas  and  remediate  to  those  specific  areas. 
For  example,  with  conventional  testing  c  te3t  covering  subject 
matter  such  as  Introduction  to  the  Naval  Chain  of  Command;  Naval 
Terminology;  AC  Theory;  DC  Theory;  and  Safety  Precautions  would 
oe  evaluated  on  a  students  overall  test  score.  It  is  very  likely 
that  a  student  could  pass  the  test  with  an  overall  score  of  70J 
but  incorrectly  answer  the  items  on  the  Safety  Precautions 
subject  matter.  With  COGENT  each  of  the  individual  subject 
matter  areas  (content  areas)  are  evaluated  and  a  report  is 
generated  listing  all  of  the  students  and  their  scores  in  all 
content  a  ^  e  a  s  ,  The  areas  where  the  student  failed  to  meet 
criteria  are  annotated  so  that  individual  remediation  can  be 
prescr  ibed  . 

2.  Reduction  in  remediatior/retest  time: 

Without  COGENT,  if'  a  student  fails  a  test,  he/sne  is 
remediated  and  retested  on  the  subject  matter  that  is  covered  by 
the  entire  test.  Since  specific  problem  areas  within  each  test 
are  readily  identified  by  COGENT,  the  student  is  remediated  and 
retested  ONLY  on  the  specific  areas  where  he/she  is  experiencing 
difficulty.  COGENT  can  automatically  generate  a  content  area 
specific  remedial  test  that  is  tailored  to  the  problem  area/s  of 
-each  student. 
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3.  Reduction  of  errors  in  grading  t  e  s  „  s : 

4  11  tests  are  conpjter  scored  so  that  the  possibility  of  an 
error  in  grading  :s  greatly  reduced. 

4.  Reduction  in  tine  to  update /add/ change  items  in  the  test  item 
oanks  : 


The  authoring  program  allows  the  test  manager  to  easily  add 
or  delete  items  m  the  test  item  bank.  Also,  changes  can  be  made 
to  existing  items.  For  test  audit  purposes,  each  test  item  has 
t h e  corresponding  objective  annotated  along  with  the  data  each 
item  was  entered  i n  to  the  item  bank. 

5.  Reduction  in  time  tc  validate  test  items: 


Tne  item  analysis  option  allows  the  test  manager  tc  easily 
generate  statistics  which  can  be  use  to  determine  item 
difficulty  and  validity. 

b.  Reduction  in  time  tc  print  a  new  test: 

The  time  tc  print  a  new  test  is  reduced  from  hours  to  minutes 
because  the  computer  automatically  prints  the  tests  from 
predetermined  data. 

7,  Reduction  of  compromised  tests: 


All  test  items  can  be  randomly  drawn  from  a  bank  of  items. 
The  items  can  then  be  printed  in  random  order  with  the 
distractors  also  randomized.  This  significantly  reduces  the 
possibility  of  test  compromise. 


8.  Reduction  of  student/class  management  time: 


COGENT  maintains  a  record  of  all  student  performance/grades 
and  provides  various  reports  by  individual  or  for  an  entire 
class.  COGENT  also  provides  a  roster  and  a  class  ranking  report 
with  notation  for  prospective  honor  graduates.  Printed  forms  for 
the  Academic  Review  Board  for  those  experiencing  academic 
difficulties  can  also  be  printed.  Students  can  also  be 
transferred  from  one  class  to  another  with  all  grades  remaining 
intact. 


9.  Ability  to  effectively  administer  a  criterion  referenced 
curriculum: 

Without  the  automatic  scoring  and  evaluation  of  tests,  it 
would  be  impossible  to  attempt  to  administer  an  effecti  \  . 
criterion  referenced  curriculum.  COGENT  automates  all  of  the 
functions  necessary  to  perform  these  tasks  which  results  in  a 
more  highly  skilled  graduate. 
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COGENT  is  currently  operating  or  an  Apple  computer  with  a  10 
megabyte  hard  disk.  In  this  configuration,  it  can  accommodate  up 
to  26  tests  with  each  test  consisting  of  up  to  10  content  areas. 
Each  content  area  can  have  up  to  100  items  in  the  test  bank. 
COGENT  can  manage  the  statistical  data  for  up  to  31  classes, 
each  class  consisting  of  up  to  50  students. 

COGENT  is  currently  being  converted  to  the  Zenith  model-120 
with  3  10  megabyte  hard  disk.  This  version  will  be  able  to 
manage  multiple  courses/curriculums  with  increased  statistical 
reporting  capabilities.  Both  versions  utilize  an  Epson  80  column 
printer  for  the  printing  of  reports  and  tests  and  a  Scantron 
model  1200  optical  mark  reader  for  the  grading  of  the  students 
answer  sheets.  The  selection  of  hardware  was  based  solely  on 
existing  in-house  equipment. 


Mircocomputer- Based  Field  Testing  for 
Human  Performance  Assessment 

P.  J.  Merkle,  Jr.,  R.  S.  Kennedy,  M.  G.  Smith, 

J.  H.  Johnson  (Essex  Corporation) 

Abstract 

The  chief  advantages  of  paper- and- penci 1  instruments  for  field  research 
are  economy  and  simplicity,  but  they  can  have  low  response  rates,  exhibit 
questionable  subject  anonymity  and  security,  and  typically  they  require  a 
proctor.  During  on  site  investigations  of  the  side-effects  from  flight 
simulator  operations,  paper- and- pencil  forms  and  an  Automated  Portable  Test 
System  (APTS)  were  administered.  The  APTS  functioned  reliably,  data 
production  increased,  and  a  high  incidence  of  simulator  side-effect 
symptomatology  was  detected. 

The  APTS  is  comprised  of  a  test  battery  and  questionnaires  embodied  in 
a  microcomputer.  The  battery  includes  tests  of  cognition,  information 
processing,  psychomotor  skill,  memory,  reasoning,  arid  others.  The 

questionnaires  are:  a  mood  adjective  checklist,  motion  sickness 

symptomatology  (with  automated  scoring),  and  motion  sickness  history 

Originally  conceived  as  a  behavioral  toxicology  assessment  tool,  the  APTS 
has  applications  for  personnel  selection  and  classification,  education  and 
training  assessment,  clinical  diagnosis,  and  health  care  delivery  systems. 

Several  military  and  university  facilities  have  purchased,  rented  or 
borrowed  an  APTS  and  have  begun  collecting  data  to  determine  the 
sensitivity  to  various  drug,  environment  or  treatment  conditions.  In  these 
preliminary  studies  one  or  more  tests  have  been  shown  sensitive  to 
morphine,  chemo  radiotherapy,  sleep  loss,  hypoxia,  amphetamine,  hyoscine, 
etc.  Some  of  these  results  are  reviewed. 

Introduction 

Prior  to  the  advent  of  low  cost,  microcomputer  systems  (ca.,  1972), 
computer  technology  was  slow  to  find  a  place  in  human  performance  and 
laboratories.  Several  factors  contributed  to  this  lag.  First  and 
foremost,  chc  cost  of  minicomputers  was  prohibitive  for  most  facilities, 
while  that  of  mainframe  systems  continues  to  be  so.  Additionally,  the 
time  sharing  operation  mode  of  undedicated  computers,  required  to  make  them 
cost  effective,  can  make  them  less  reliable  assessment  instruments  than 
traditional  methods;  the  speed  with  which  such  computers  operate  is 
dependent  upon  the  number  of  users  accessing  the  system  at  any  specific 
time.  This  factor  results  in  inconsistent  timing  parameters  from  one 
testinq  session  Lo  the  next.  If  the  system  is  "down"  for  maintenance  or 
development,  all  data  collection  comes  to  an  abrupt  halt.  Although 
mainframes  and  minicomputers  opened  new  frontiers  for  study,  they  did  not 
prove  to  be  cost-effective  for  most  behavioral  science  laboratories. 

Microcomputers  expand  the  pote.utial  for  the  study  of  psychological 
phenomena  in  at  least  two  ways.  First,  they  permit  more  comprehensive 
measurement  than  traditional  tests  by  providing  latency  and  other 


information  not  ordinarily  available  with  traditional  approaches.  Second, 
they  are  capable  of  controlling  devices  which  produce  speech,  and  they  are 
suited  for  complex  video  displays,  thereby  increasing  the  number  of  sensory 
molalities  that  can  be  involved  in  a  testing  situation.  Microcomputers  are 
<ilso  well  suited  for  memory  tests,  because  unlike  paper  and- penci 1  methods, 
they  can  present  stimuli  for  short  periods  of  time. 

Potential  advantages  of  microcomputers  for  psychological  testing  may 
include:  (1)  identification  of  true  deficit  in  performance  from 
developmental  problems;  (2)  otandardized  presentation  which  may  lead  to 
improved  comparability  of  tests;  (3)  higher  test  reliabilities  due  to  more 
accurate  control  of  stimulus  material;  (41  bypassing  of  infirmities  (e.g., 
memory  deficits,  dyslexia,  dystrophy)  of  certain  groups,  with  performance 
testing  in  innovative  modes  (e.g.,  voice  recognition,  touch  panels,  large 
or  back  lighted  keyboards,  eye- tracked  systems);  t5)  more  comprehensive 
assessment  of  individuals;  (6)  potential  for  new  assessment  paradigms  and 
perspectives  for  understanding  of  human  performance;  and  (7)  possible 
provision  of  more  intrinsically  motivating  tests  to  subjects. 

Automat ed  Portable  Test  System  Overview 

The  AUTOMATED  PORTABLE  TEST  SYSTEM  (APTS)  is  the  first  complete, 
compact  (can  be  hand  held)  system  of  its  kind.  It  is  being  produced 
expressly  for  human  performance  assessment  whether  in  unusual  environments, 
with  toxic  substances,  ot  with  other  treatments.  APTS  is  capable  of 
controlling  and  administering  complex  psychological  testing  routines,  while 
entering  and  collating  responses  and  latencies  with  accuracy  and 
precision.  The  battery  includes  tests  of  cognition,  information 
processing,  psychomotor  skill,  memory,  reasoning,  and  others. 

Tne  APTS  makes  it  possible  to  maintain  a  professional  research 
workspace.  Any  data  collected  or  documents  written  on  the  APTS  can  be 
printed  out  for  examination,  sent  via  modem  to  the  home  office,  or  loaded 
onto  an  external  cartridge  for  easy  transport  to  other  locations.  Also, 
information  from  other  locations  can  be  communicated  through  external 
cartridge  or  cassette  tape  (new  tests  to  use,  modified  data  formats,  etc.). 

The  APTS  is  comprised  of  three  subsystems:  (1)  hardware;  (2)  test 
programs;  and  (3)  system  control. 

Hardware 

The  hardware  subsystem  has  been  developed  around  a  notebook  .sized  8  bit 
personal  computer:  the  NEC  PC  8201  A.  Integral  to  the  microcomputer  is  a 
32K  internal  read  only  memory  (ROM)  containing,  in  addition  to  TELCOM  and 
TEXT  EDITOR,  a  version  of  Microsoft  BASIC.  The  technical  features  of  the 
microcomputer  are  more  fully  described  in  NEC  User’s  Guide  (1983).  Within 
the  small,  lightweight  package,  the  system  has:  substantial  onboard  random 
access  memory  (RAM)  capacity  expandable  to  96K;  an  external  battery  option 
(8  A  h)  providing  for  more  than  100  h  of  continuous  operation;  and  a 
built  in  display.  Refer  to  Figure  1  for  a  hardware  overview. 

Augmenting  the  notebook  microcomputer  are  the  wide  variety  of  auxiliary 
components.  Among  these,  the  (32K)  RAM  cartridges  have  proved  particularly 


useful  in  applications  to  date.  For  field  applications,  the  APT  System  and 
Testing  Programs  are  maintained  in  internal  RAM,  and,  after  data 
collection,  data  are  transferred  to  a  RAM  cartridge  for  mailing  or  carrying 
from  remote  sites  to  a  centralized  data- base  location.  (See  Figure  1)  For 
laboratory  applications,  it  is  anticipated  that  researchers  may  find  it 
useful  to  extend  the  capabilities  of  the  microcomputer  with  an  external 
display  (CRT),  L  ,.oppy  disks,  and  computer  interfaces.  Overall,  the  NEC  PC 
8201A  has  the  extension  options  required  for  a  wide  range  of  field  and 
laboratory  applicat ions. 

Te st  Progr ams 

The  APTS  component  programs  are  developed  following  an  iterative 
three  stage  process:  identification,  mechanizat ion,  and  evaluation.  The 
identification,  until  now,  has  been  on  the  basis  of  sound  metric 
properties.  Future  considerations  will  include  operational  relevance  and 
prediction  and  construct  validity.  The  mechanization  is  conducted  in  house 
and  is  the  work  of  Essex  Orlando's  Chief  of  Systems  (M.  G.  Smith).  The 
programming  is  in  BASIC  (Microsoft)  and  Assembly  language.  Evaluation  has 
subjected  microbased  tests  to  a  repeated  measures  analysis  where  they  were 
compared  to  their  paper  and- pencil  analogues.  Although  there  have  been 
exceptions  (and  these  were  predictable),  nearly  all  have  exhibited  strong 
commonalities. 

System  Control 

The  APTS  has  been  developed  to  provide  a  human  assessment  capability 
suitable  for  use  in  remote  operational  settings.  As  presented  in  the 
system  overview,  the  hardware,  test  program,  and  system  control  subsystems 
meet  the  requirements  for  such  a  system.  The  notebook- sized  NEC  PC  8201A 
provides  the  basis  for  an  easily  transportable  and  flexible  assessment 
system  with  expansion  options  required  for  a  wide  range  of  field  and 
laboratory  applications.  Additionally,  the  development  of  test  programs  is 
being  conducted  by  a  process  to  assure  efficiency  and  construct  validity. 
This  process  is  based  both  on  evaluation  tools  developed  for  computer  tests 
and  on  lessons  learned  during  the  PETER  Program  (Bittner,  Carter,  Kennedy, 
Harbeson,  &  Krause,  1984;  Smith,  Krause,  Kennedy,  Bittner,  &  Harbeson, 
1983).  Lastly,  the  experimental  control  subsystem  has  been  simplified  for 
use  by  paraprofessionals  with  minimal  training. 

The  APT?  has  substantial  prospects  for  future  growth  and  development. 
Attesting  to  this  are  recent  and  ongoing  studies  that  have  indicated  that 
it  has  considerable  promise  for  use  in  a  broad  range  of  unusual 
environments.  For  example,  both  an  explosive  decompression  study  and 
flight  testing  have  found  that  the  system  is  suitable  for  high  altitude, 
chamber,  or  airborne  applications.  These  applications,  coupled  with  the 
evaluation  of  the  design  for  NASA  and  NSF,  have  indicated  that  the  system 
could  easily  be  adapted  for  orbiting  shuttle  or  space  station  use  by 
applying  spray  coatings  to  the  interior  and  to  the  exterior  case.  In 
addition,  the  NEC  PC  8201a  has  demonstrated  robust  capabilities  to  operate 
in  at  least  the  range  of  0°  to  32<>  C,  to  survive  drop  tests,  and  to 
withstand  multiple  airport  x  ray  exposures.  The  reliability  of  the  system 
has  been  demonstrated  during  extensive  field  studies  (>103  operational 
hours  without  failure). 
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Dr.  Sam  Schiflett  of  the  USAF  Aerospace  Medical  Research  Laboratory  at 
Brooks  ARB,  Texas,  is  using  the  short  battery  (Grammatical  Reasoning, 
Pattern  Recognition,  Code  Substitution,  and  Tapping)  at  two  different 

altitudes  in  a  parametric  study.  Thus  far  he  has  shown  that  performances 
are  degraded  in  the  following  ways:  greater  deficits  are  seen  at  25,000 

ft.  vs.  18,000  ft.,  and  cognitive  performances  are  more  disrupted  than 

motor . 

Dr.  Mary  Williams  at  the  University  of  New  Orleans  is  studying  whether 
learning  curves  of  persons  with  identified  learning  disabilities  reveal 
different  slopes  and  different  acquisition  rates  for  different  tasks, 
and/or  whether  there  are  relationships  between  initial  score,  rate  of 
acquisition,  and  terminal  score. 

Dr.  Darryl  Mellard  of  the  Institute  for  Research  in  Learning 
Disabilities  at  the  University  of  Kansas,  Lawrence,  is  working  on  a 

contract  with  the  California  Community  College  system  to  develop  and 
establish  eligibility  criteria  by  identifying  persons  with  learning 
disabilities.  A  sample  of  persons  diagnosed  as  learning  disabled  are 
practicing  10  different  tests  following  a  similar  paradigm  to  that  being 
used  by  Dr.  Mary  Williams.  One  line  of  this  research  is  the  study  of 
individual  differences  in  the  rate  of  acquisition  of  LD  subjects  versus 
normals. 

Dr.  Todd  Jones  of  the  US  Coast  Guard  in  Washington.  D.C.,  is  using 
several  NEC  PCs  implemented  with  the  short  performance  batteries  to  study 
fatigue  at  sea  as  well  as  the  effects  of  ship  motion  on  performance. 

Dr.  Lou  Bandaret  of  the  Army  Natick  Laboratories  in  Massachusetts  is 
examining  "Tower  of  Hanoi"  and  other  complex  games  as  tests.  He  is 
interested  in  performance  effects  of  altitude  and  thermal  stress.  A  study 
is  now  underway  at  that  facility,  under  the  direction  of  Dr.  Charles 
Houston,  on  a  protracted,  simulated  climb  of  Mt.  Everest. 

Dr.  James  May  at  the  University  of  New  Orleans  is  comparing  the  effects 
of  performance  of  pre  and  postexposure  to  optokinetic  stimulation  and 
pseudo  Coriolis  stimulation  to  determine  normals  and  persons  with  bilateral 
labyrinthine  defects.  He  also  has  masking,  meta  contrast,  and  other 
temporally- based  vision  tests  implemented  on  a  NEC  PC8201A,  and  is  doing 
pilot  work  with  learning  disabilities. 

Dr.  Ann  Streissguth  at  the  University  of  Washington  Medical  Center, 

Seattle,  is  conducting  a  series  of  studies  on  the  effects  of  Fetal  Alcohol 

Syndrome  on  human  performances.  There  are  suggestions  that  recent  memory 

loss  is  a  key  ingredient  in  the  Fetal  Alcohol  Syndrome,  and  one  test  on  the 
NF.c  PC8201A,  the  complex  counting  test  which  assesses  a  person's  ability  to 
keep  track  of  several  things  at  once  with  changing  states,  may  be 

particularly  useful. 

Under  contract  to  the  Naval  Training  Equipment  Center,  Essex  has  tested 
700  800  student  pilots  before  and  just  after  they  were  exposed  to  a  ground 
based  flight  trainer.  In  those  studies  >  10%  reported  motion  sickness  like 
symptomatology.  Performance  data  are  being  compared  with  subject  reports. 
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Essex,  through  contracts  witti  NASA  and  NSF,  has  evaluated  the  APTS  and 
is  issuing  three  reports.  A  final  report  of  the  Portable  Human  Assessment 
Battery  (PHAB)  has  been  submitted  to  NSF.  The  expet imental  work  for  these 
efforts  has  been  conducted  by  R .  L.  Wilkes,  at  Casper  College,  Wyoming. 
For  a  complete  reference  list  see  Kennedy,  Dunlap,  Wilkes,  and  Lane  MTA 
198b . 

Thus  far,  two  studies  have  been  completed  for  NASA(1  and  II).  In  the 
NASA  I  study.  The  proposed  6  minute  (APTS)  battery  was  administered  four 
times  along  with  analogous  paper  and  pencil  tests.  Performance  appeared 
stable  and  reliable.  Two  factors  emerged.  Reprints  are  available  of  this 
study  and  the  preilminaty  work  which  preceded  it.  The  NASA  II  study 
administered  a  longer  battery  and  those  data  are  presently  being  analyzed. 

The  study  for  NSF  incorporated  a  long  (10  minute)  battery  administered 
over  10  sessions  and  compared  performance  with  measures  of  IQ  (WAIS). 
Performance  was  stable  for  seven  of  the  tests  (vertical  math  and  dynamic 
visual  acuity  [new  tests]  did  not  fare  well).  Four  factors  emerged.  Good 
correlations  with  Performance  Scale  WAIS  scores  (multiple  R  =  .89)  were 
obtained;  somewhat  poorer  with  Verbal  Scale  scores  (multiple  R  =  .67).  A 
preliminary  draft  is  available. 

Dr.  Randall  Kohl  at  NASA's  Space  Biomedical  Research  Institute, 
Houston,  is  undertaking  a  repeated  measures  performance  testing  paradigm 
using  motion  sickness  drugs  and  provocative  motion  sickness  tests  while 
performance  is  assessed.  Data  collection  is  underway. 

Di .  Pia  Par th  at  the  Fred  Hutchinson  Cancer  Research  Institute, 
Seattle,  Washington,  is  administering  a  battery  of  tests  from  the  PETER 
program  on  a  NEC  PC8201A  to  patients  who  have  received  bone  marrow 
transplants  as  well  as  chemo  radiotherapy ,  and  to  a  related  cohort  control 
group  over  many  months  of  replications.  The  project  is  in  its  9th  month 
and  clear  cut  differences  in  both  learning  and  performance  are  evident  in 
the  treated  group.  Subsidiary  studies  are  ongoing  using  morphine  where 
performance  decrements  on  different  tests  were  shown. 

Dr.  Charles  Wood  at  Louisiana  State  University  in  Shreveport, 
Louisiana,  is  studying  the  effects  of  dexedrine,  hyoscine,  and  scopolamine 
on  performance  on  the  short  (API'S)  battery.  Preliminary  findings  on  eight 
subjects  show  performances  in  predicted  directions  with  some  statistically 
significant.  Longer  tests  and  more  subjects  are  planned  for  future  studies. 

CAPT  Wayne  ccussens  at  the  Tripler  Hospital  in  Hawaii  is  in  the  process 
of  evaluating  the  effects  ot  work  at  altitude  and  mountain  sickness  on 
per  formance . 

CDR  Charles  Hutchins  and  his  students  at  the  US  Naval  Postgraduate 
School,  Monterey,  California,  has  shown  'hat  up  to  40  hours  sleep  loss 
reduces  performance  (p  <  .02)  on  APTS  te^ts,  particularly  Code  Substitution. 

Dr.  Michael  McCauley  of  Monterey  Technology  Inc.,  Monterey, 
California,  under  a  US  Coast  Guard  contract,  has  collected  some  at  sea  data 
to  study  workload,  fatigue,  and  environmental  stress. 
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Trie  APTS  has  proven  to  be  rugged  and  reliable,  but  it  is  the  first 
generation  of  such  a  device.  As  more  test  are  implemented  on  the  current 
APTS  and  as  new  computer  technologies  develop  an  even  more  flexable  system 
will  emerge.  A  current  limitation  of  the  APTS  is  the  Liquid  Crystal 
Display  (LCD)  which  is  difficult  to  read  under  some  lighting  conditions. 
I l  is  hoped  that  new  flat  panel  displays  will  be  integrated  into  the  next 
generation  of  portable  computers  and  hence  a  future  generation  of  the 
APTS.  Future  plans  have  also  been  made  to  transport  the  APTS  to  the  IBM  PC 
coirpatabie  environment.  This  transportation  will  broaden  the  scope  of  the 
APTS  applications. 

Support  for  this  project  w is  provided  by  National  Aeronautics  and 
Space  Administration  Contract  NAS  9-16982  and  the 
National  Science  Foundation,  Contract  No.  00559 


OVERVIEW  OF  FERIPHERAL  INTERFACES 
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In  recent  years,  the  U.S.  Army  has  developed  arid  implemented  increas¬ 
ingly  complex  and  sophisticated  weapons  and  communications  sys  eras.  During 
this  same  period,  entry-level  soldiers  have  demonstrated  declining 
reading,  writing  and  computing  skills.  The  discontinuity  between  incieas- 
i n g 1 v  demanding  jobs  and  decreasingly  skilled  personnel  has  constituted  a 
substantial  training  problem  for  the  Army.  One  means  of  addressing  this 
problem  has  been  provided  by  the  U.S.  Army  Research  Institute  (ARI)  in  the 
f o riu  of  a  hand-held,  portable,  computerized  cu^nr  that  teaches  technical, 
job-related  subject  matter. 

Most  computer-based  instructional  systems  consist  of  desk  top  devices 
that  are  /ery  costly  and  are  confined  to  a  site  to  which  users  are  also 
confined  In  order  to  benefit  from  the  instruction  provided.  The 
four-pound,  ba ttery-operatea  device  called  the  Hand-Held  Tutor,  that  was 
developed  in  accordance  with  ARI's  specifications,  is  intended  for 
out-of-classroom  environments  (mess  halls,  motorpools,  barracks,  etc.)  to 
convert  waiting  periods  into  training  opportunities.  Eacn  soldier,  there¬ 
fore  can  work  with  a  tutor  independently  in  a  variety  of  settings  rather 
than  sharing  a  mic  ooornputer  terminal  at  a  fixed  location.  The  device  was 
also  required  to  incorporate  the  following  features: 

1-  diagnostic  pretests 

2-  self-paced  instruction 

3-  gaming 

A-  instruction  compatible  with  varying  initial  knowledge  levels 

initial  motivation  levels 
rates  of  learning 


5-  frequent  corrective  feedback 

The  exteiior  features  of  the  tutor  include  a  9"  by  11"  plastic  case 
with  an  indentation  molded  on  its  top  surface  to  hold  a  5"  by  5"  booklet 
that  provides  ins tiue tiona 1  information  and  directions  for  interacting 
with  the  computer.  Above  the  booklet  is  a  multifunction  liquid  crystal 


dioae  dis[  lay  screen  chat  includes  a  two  digit  counter  and  twenty-nine 
character  space  for  questions,  instructions,  definitions  and  feedback. 
Below  the  booklet  is  a  keyboard  equipped  with  domed  conductors  to  provide 
tactile  feedback  to  the  user.  The  keyboard  displays  numerals  0  through  9, 
letters  A  throubh  C,  and  the  words  SAY,  ERaSE,  and  GO.  Beside  the  display 
screen  on  the  upper  front  surface  of  the  tutor  is  a  built-in  speaker  and 
in  the  rear,  a  jack  for  alternative  earphones.  Also  on  the  back  of  the 
tutor  casing  is  a  jack  for  a  battery  recharger,  a  swi tch/volume  control, 
and  a  receptacle  for  plug-in  modules  that  encase  a  computer  chip 
programmed  for  the  Military  Occupational  Specialty  (MOS)  instruction 
provided  in  the  accompanying  courseware  booklet.  The  plug-in  nature  of 
th'5  module  offers  the  potential  to  permit  the  essential  hardware  features 
to  accommodate  a  great  variety  of  MOS  instruction.  Selections  of  all 
features  of  the  tutor  were  based  on  cost,  availability  and  human  factors 
considerations.  For  example,  the  displa,  screen  was  chosen  to  optimize 
brightness,  contrast  ratio,  size,  character  font  and  legibility  within 
size  and  cost  constraints.  The  printed  courseware  booklet  represents  an 
economical  alternative  to  systems  that  store  text  and  graphics  in  computer 
memory  for  display  on  a  CRT. 

The  major  considerations  in  courseware  development  included  multiple 
teaching  techniques  (gaming,  drill  and  practice,  etc.)  to  maximize  a  match 
with  individual  learning  styles,  initial  knowledge  levels  and  rates  of 
learning.  Users  can  make  selections  from  a  menu  of  teaching/ testing 
options  thv t  include  gaming.  The  booklet  includes  many  pi_tures  and  other 
graphic  presentations  and  the  computer  provides  both  immediate  and  delayec 
visual  and  oral  feedback  to  responses  to  multiple  choice  questions. 

The  courseware  is  divided  into  units  that  are  sequenced  from  less  to 
more  difficult  to  promote  an  early  experience  of  success  by  the  user.  Each 
unit  consists  of  a  Pretest,  Explanation,  Picture  Battle  and  Word  War. 

Users  can  choose  any  unit  to  work  with  and  any  component  within  the  unit 
se lec ted . 

The  Pretests  are  short  tests  that  are  intended  to  establish  whether 
the  user  is  knowledgable  about  the  subject  matter  being  presented.  If  all 
but  one  or  every  question  is  answered  correctly,  the  final  score  is 
presented  vocally  and  the  user  is  permitted  to  move  Co  any  other  component 
or  anv  other  unit  or,  desired,  to  review  the  Pretest.  If  more  than  one 
answer  is  wrong,  the  user  is  directed  to  return  to  the  first  Pretest  item, 
reviews  the  test  with  accompanying  corrective  feedback,  and  then  is 
directed  to  the  Explanation  component  in  which  the  subject  matter  is 
taught.  This  component  includes  test  questions  as  a  check  on  the  progress 
of  the  instruction. 

The  Picture  Battle  component  ceqUires  matching  pictures  or  graphic 
presentations  wi  tli  visual/oral  stimuli.  This  component  displays  projec¬ 
tiles  at  each  end  of  the  display  screen  representing  friendly  and  enemy 
targets.  Correct  responses  result  in  movement  of  t ne  friendly  projectile 
toward  the  enemy  target  and  incorrect  responses  result  in  the  same  kind  of 
movement  of  the  enemy  projectile.  The  objective  is  to  destroy  the  enemy 
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target  before  it  reaches  the  friendly  one.  The  impact  with  the  enemy 
target  is  accompanied  by  a  sound  resembling  an  artillery  shell  exploding. 
The  impact  with  the  friendly  target  only  results  in  both  projectiles 
returning  to  staring  positions  to  re-start  the  game. 

Word  War  is  a  component  that  is  independent  of  the  booklet.  Both 
question*  and  multiple  choice  answers  are  presented  by  the  computer  in  the 
form  of  electronic  flashcards  on  the  display  screen.  The  instructional 
method  calls  for  drill  and  practice  in  an  increasing  ratio  review  format. 
That  is,  incorrect  responses  result  in  the  question  being  presented  again 
after  one  succeeding  question,  and  once  again  after  three  additional  items 
have  been  presented.  Multiple  choice  answers  to  questions  answered 
incorrectly  are  randomly  selected  from  other  choices  stored  in  the  tutor's 
chip.  Also,  the  position  of  the  correct  answer  choice  is  randomly  varied. 
The  success  of  increasing  ratio  review  has  been  demonstrated  to  shift 
learned  information  from  short  to  long  term  memory. 

The  tutor,  therefore,  incorporates  varying  teaching  techniques, 
presentation  modes  and  kinds  of  feedback  in  order  to  enhance  acquisition 
and  retention  of  the  selected  subject  matter.  The  courseware  is  heavily 
weighted  with  frequent,  short  tests  to  permit  the  user  to  monitor  progress 
in  acquiring  the  needed  information  and  to  focus  attention  on  the  most 
relevant  materials. 


In  1981,  ARI  awarded  a  contract  to  Franklin  Research  Center  (FRC)  and 
its  subcc.i trac tor ,  Educational  Testing  Service  (ETS)  fcr  initial  develop¬ 
ment  of  the  tutor.  To  test  the  feasibility  of  the  device,  it  was  decided 
to  take  advantage  of  the  existing  research  foundation  in  computer-based 
methods  for  training  vocabulary. 


Accordingly,  a  vocabulary  tutor  was  developed  to  teach  technical 
terminology  to  Cannon  Crewmen.  This  tutor  was  evaluated  at  Fort  Polk, 
Louisiana,  and  Fort  Drum,  New  York.  Results  demonstrated  substantial 
increases  in  scores  on  a  v-^abularv  test,  that  soldiers  enjoyed  using  the 
tutor  and  that  they  found  it  trouble-free  and  easy  to  use.  Most  soldiers 
tried  a'i  of  the  components  in  the  unite  and  most  completed  all  of  the 
units.  There  was  a  considerable  difference  between  the  fastest  and  slow¬ 
est  time  required  to  complete  all  units,  which  suggests  that  self-pacing 
is  an  appropriate  feature  of  the  program. 


In  1983  FRC  and  ETS  were  awarded  a  contract  that  called  for  the 
adaptation  of  the  tutor  to  teach  mathematics.  The  courseware  was  devel¬ 
oped  to  address  the  needs  of  MCS  12B,  Combat  Engineeers.  In  addition,  the 
contractor  constructed  a  RS-232-C  serial  interface  for  the  tutor.  This 
effort  provided  a  hardware/ sof twa re  data  link  through  which  the  tutor  can 
communicate  with  other  computers.  The  demonstration  model  permits  the 
desk  top  microc  )mputer  to  download  course  materials  to  the  tutor,  which 
can  chen  be  disconnected  and  transported  to  another  site  for  study.  A 
diaQnostic  feature  allows  for  the  microcomputer  to  upload  responses  to 
lest  questions,  assess  the  needs  of  the  user,  then  download  appropriate 
homework  on  which  the  user  can  practice  before  returning  for  retesting. 


This  development  greatly  increases  the  flexibility  of  the  tutor  and 
provides  the  potential  for  storing  instructional  materials  for  a  variety 
of  MOS ,  each  set  of  which  can  be  transferred  to  the  portable  device  as 
needed . 

In  1964,  ETS  and  its  subcontractor,  Advanced  Technology  Laboratories, 
were  awarded  a  contract  that  required  development  of  plug-in  modules  to 
accompany  the  mathematics  courseware.  This  application  of  the  tutor  is 
under  evaluation  now  at  the  Naval  Ordnance  Station  in  Indian  Head, 
Maryland.  This  evaluation  site  is  appropriate  because  the  mathematics 
needs  of  the  MOS  12B  soldiers  proved  to  be  cf  so  general  a  nature  that  a 
broad  range  of  service  members  are  expected  to  benefit  from  the  instruc¬ 
tion.  In  addition,  the  contractor  is  in  the  process  of  adapting  the  tutor 
to  provide  Ml  Tank  Commanders  with  instruction  in  fire  commands  and  de¬ 
graded  mode  gunnery.  This  application  will  be  evaluated  in  January  of 
1986. 

The  next  application  of  the  tutor  planned  by  ARI  is  for  instruction  in 
Eng  1 i sh-a s-a-second  language.  We  intend  that  this  version  incorporate  a 
miniaturized  tape  recorder  to  simultaneously  teach  reading  and  understand¬ 
ing  spoken  English  as  well  as  pronunciation. 


DESIGN  OF  AN  OCCUPATIONAL  DATA  ANALYSIS  SYSTEM 


MAJOR  C.P.  WHEELER  ROYAL  ARMY  EDUCATIONAL  CORPS 
ARMY  SCHOOL  OF  TRAINING  SUPPORT 

INTRODUCTION 

1. The  British  Army  has  for  sane  tine  based  its  occupational  data  analysis 
programme  on  a  version  of  CCDAP  implemented  in  the  IBM  format  and  currently 
running  on  an  IBM  3083  mainframe  computer.  This  version  has  been  superseded  by 
CCDAP  80  which  was  considered  to  be  the  natural  progression  in  CCDAP 
evolution.  Its  proposed  implementation  however  highlighted  difficulties  that 
would  constrain  the  exploitation  of  the  power  of  CCDAP  80.  It  was  decided 
therefore,  that  m  view  of  the  progress  made  by  the  United  States  Air  Force 
Human  Resources  laboratory  in  the  development  of  Advanced  CCDAP,  the  time  was 
appropriate  to  review  the  Army' s  requirements  and  investigate  the  possibility 
of  creating  a  dedicated  occupational  data  analysis  system. 

AIM 

2.  The  aim  of  this  paper  is  to  describe  the  analysis  of  the  total  requirement 
of  the  Army*  and  the  subsequent  design  and  specification  of  a  suitable  system 
that  best  meets  those  requirements. 

SYSTEM  ANALYSIS 

3.  The  decision  to  use  a  formalised  system  analysis  technique  was  taken  at  an 
early  stage  in  the  investigation.  The  technique  chosen  was  Learmonth  arri 
Burchett's  Structured  System  Analysis  and  Design  Method,  (SSADM)  which  is  a 
Ministry  of  Defence  approved  system.  SSADM  is  a  data  driven  method  which  takes 
as  input  an  initial  statement  of  requirement  and  produces  the  following 
outputs : 

a.  Program  specifications 

b.  User  clerical  procedures 

c.  Operating  instructions 

d.  File  design  or  data  base  schema 

e.  Plan  for  testing  and  quality  assurance 

4. System  analysis  and  design  is  tackled  as  six  phases  with  each  phase  broken 
into  steps  and  activities.  There  are  clearly  defined  interfaces  Jetween  steps 
_n  the  form  of  working  documents  and  criteria  for  review  and  project 
reporting.  These  six  phases  fall  nicely  into  two  sections: 

a.  The  What  -  Systems  Analysis 

(-.  i.  Analysis  of  Current  System 

(n).  Specification  of  the  Required  System 

(iii).  Selection  of  an  Option  for  Implementation 

b.  The  Hew  -  Systems  Design 

(i) .  Detailed  Data  Flow  Design 

( i i )  .  Detailed  Prccedure/Processing  Design 
(in).  Optimisation  of  the  Physical  Design 

Current  System  Analysis 

5.  The  system  analysis  in  tins  instance  was  a  fairly  simple  process  and  did 
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not  require  a  great  depth  or  investigation  or  data  now.  The  current  system, 
though  incorporating  an  automatic  data  processing  procedure,  is  fundamentally 
a  manual  one.  There  is  no  standardised  questionnaire  format  and  the  design  is 
very  much  a  ratter  of  study  team  style.  The  transcription  of  raw  data  frcm  the 
questionnaire  to  a  computer  readable  format  is  a  data  preparation  bureau 
activity  which  produces  a  magnetic  tape  containing  che  data  in  I CL  format. 
This  tape  is  mailed  to  the  computer  centre  together  with  a  magnetic  tape 
transcription  of  the  job  control  cards  and  run  within  the  CODAP  shell.  Results 
printouts  are  sent  to  the  initiating  agency  for  subsequent  analysis,  and,  if 
necessary,  re-runs  of  the  data  with  other  options  frcm  the  CODAP  suite  of 
programs  are  completed.  Finally  a  report  is  produced  with  appropriate 
reocnmendations .  A  modified  data  flow  diagram  is  given  at  figure  1  which,  for 
the  purposes  of  this  paper,  adequately  describes  the  current  system. 


PREPARE  A 

STUDY  PLAN 


decision 

TO  CONDUCT 
A  STUOY 


|  BOOK  DATA  I  | 
I  PREPTTME  i 


800K  COMPUTER 
RUN  Tfc* 


DESIGN  A  L>J 

PRODUCE  PLOT  L X! 

QUESTONNAFE  HH 

OUESTIONNARE 

(  SAMPLE 
POPULATION 

PRODUCE 
JOB  CONTROL 
CARDS 


1 


-E-RW  DA'i  . 
fr  ■">:  jLRED  1 


PRODUCE  FNAl 
OUESTIONNARE 


OATA  i 
PREPARATION  f 


PROOUCE 
MAG  TAPE 


01$  TRIBUTE 

ouestonnajre 


CHECK  & 
PREPARE 
RETURNED 

questonnares 


TOTAL  I 
POPULATION  j 


PRODUCE  N 
APPROPRIATE 
FORMAT 


ANALYSE 

RESU.TS 


WRITE 

REPORT 


figure  1 


Current  System  Dab'  Flew  Diagram 


The  production  of  this  document  achieved  two  aims,  firstly  it  ensured  the 
analyst  had  a  complete  understanding  of  the  system  and  secondly,  it  enabled 
him  to  identify  those  processes  flat  were  responsible  for  system  operational 
shortfalls. 


Problem  Definition 

6.  There  were  two  areas  where  problems  occured,  those  that  were  interent  in 
the  CODAP  system  and  those  flat  were  a  result  of  the  particular  implementation 


are  addressed 


at  the  Royal  Arrry  Pay  Corps  Computer  Centre, (RAPC  CC) .They 
separately  be  lev. 

a.  Inherent  Problems 

(i ) .  The  version  cf  CODA?  m  use  is  a  translation  cf  the 
original  USAF  SPERRY  1100  CODAP  and  re fleets  the  state  of 
development  m  1977,  (the  year  of  acquisition).  On-line  data 
base  interrogation  is  not  possible.  Confirmation  of  a  point 
or  the  establishment  of  a  trend  requires  a  re-run  of  the  raw 
data  witn  fresh  goo  control  cards.  Tms  process  is  lengthy  as 
fresh  computer  run-time  slots  need  tc  be  booked, 
in;.  The  processed  data  is  output  in  the  form  of  tables 
which  require  sore  expertise  and  patience  to  translate.  When 
identifying  trends  this  is  arduous  and  enoourages 
misinterpretation  of  the  faces. 

(in’.  There  is  a  limit  of  999  task  history  and/or  time  data 
iters  available  tc  each  study.  Though  for  sane  studies  this 
is  more  than  adequate  in  many  of  those  proposed  it 
may  well  re  a  constraining  factor.  In  a  recent  study  of  Army 
makers'  education  the  questionnaire  had  tc  be  "sliced"  to 
accomodate  same  22CC  data  items  frar.  6CC  respondents.  This 
considerably  added  to  the  analysis  task. 

(i) .  CODAP  was  designea  to  utilise  Optical  Mark  Reader,  (CMR) 
data  input  peripherals.  The  host  Carpi ter  Centre  did  not 
permit  the  use  of  this  facility  in  the  original  implementation 
for  security  reasons.  It  was  feared  ASTS  control  of  the 
data,  (and  by  implication  the  carputer),  would  compromise  the 
integrity  of  the  system  and  its  other  applications  programs.  The 
result  is  a  resource  intense  and  time  consuming  data  preparation 
process.  Currently  based  on  a  bureau,  the  data  magnetic  tape  is 
prepared  in  ICL  format  which  has  to  be  transcribed  into  IBM 
format  prior  to  run-tame.  Slippages  do  occur  in  this  procedure 
giving  worst  case  times  in  the  order  of  25  weeks  for  data 
preparation.  The  best  case  was  3  weeks  with  an  average  of  10 
weeks. 

( ii )  .  CODAP  is  a  lew  priority  task  at  the  Computer  Centre 
consequently  their  resource  and  time  allocation  to  it  is 
limited.  This  further  aggravates  the  problem  indicated  above. 

Specification  of  the  Required  System 

7.  Traditional  job  analyses  for  training  have  concentrated  on  the  collection, 
amongst  others,  of  information  relating  to  task  difficulty,  importance  and 
frequency,  (DIF  analysis).  Though  apparently  simplistic,  all  of  these  factors, 
if  taken  in  isolation  are  liable  to  produce  data  that  will  ultimately  result 
in  a  wrong  interpretation  bj  the  analyst  and  subsequent  incorrect  training 
decisions  being  made. 

8.  To  seek  estimates  of  cask  difficulty  alone  car.  result  in  invalid  training 
decisions.  Estimates  can  be  made  by  the  job  incumbent  based  on  differing 
factors,  size,  weight,  environment,  availability  of  spares  as  well  as  the  more 
obvious  and  expected  responses  concerning  lack  of  training,  lack  of  practise 
or  lack  of  ability.  It  js  now  accepted  that  a  less  ambiguous  measure  of 
difficulty  is  the  time  taken  to  learn  to  perform!  the  task  to  some  required 
level.  This  factor  lias  some  value  rn  skill  retention  analysis. 
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9. Similarly  estimates  of  urpcrtance  of  a  task  are  of  little  value  without 
further  amplification  of  the  frame  of  reference.  The  estimates  of  "importance 
to  wncrr."  and  "  importance  for  wnat"  can  Deccrne  quite  subjective  ana  difficult 
to  quantify.  If  it  is  importance  that  we  wish  to  measure  then  the  job 
incumbent's  supervising  officer  is  better  placed  to  make  that  assessment. 
Within  defence  requirements ,  importance  can  De  considered  to  be  a  composite 
of : 

a.  Immediacy  cf  skill  requirement  after  training 

b.  Consequence  of  inadequate  trammq 

1C. Precise  measures  of  frequency'  cf  performance  carnet  be  made  by  job 
incumbents,  often  due  to  the  cyclic  or  seasonal  nature  of  jobs.  They  can, 
newever,  ircicate  that  they  spend  more  time  on  one  task  than  another  m 
relative  terms.  It  is  accepted  that  such  a  relative  time  scale  can  produce  a 
more  reliable  measure  of  time  spent  on  tasks  and  that  subsequent  calculation 
of  percentage  involvement  is  more  accurate. 

11.  These  factors  were  accepted  oy  the  Army  at  the  time  CCOAP  was  first 
implemented,  in  reality  it  was  a  major  reason  for  its  aquisition.  The 
situation  has  r.ot  changed,  CCCAP  is  still  considered  to  be  the  most 
appropriate  analytical  tool  available  to  military  occupational  analysts  ard 
training  designers.  It  was  decided  that  any  future  occupational  data  analysis 
system  should  be  based  on  the  most  recent  development  of  CODAP.  Therefore 
system  design  should  concentrate  on  improving  those  areas  of  operational 
shortfall  identified  in  the  analysis. 

12.  The  most  important  requirement  for  a  modem  occupational  data  analysis 
system  is  an  automated  data  capture  facility.  Data  can  oe  rapidly  transcribed 
from  standardised  questionnaires  and  input  directly  to  computer  memory.  This 
is  an  OMR  operation.  The  ability  to  interrogate  the  data  base  in  real-time  is 
an  advantage.  The  analyst  should  be  able  to  manipulate  data  in  an  interactive 
manner  using  an  on-line  query  language.  Care  also  must  be  taken  to  ensure  the 
system  file  and  record  lengths  are  of  sufficient  magnitude  to  handle  the 
largest  studies.  Failing  this  'slicing'  of  the  data  should  not  seriously 
detriment  the  study. 

13.  The  output  of  the  system  should  be  readable  with  an  appropriate  use  of 
graphics  to  display  data  in  a  format  such  that  trends  are  highlighted.  A  more 
thorough  examination  of  data  relevant  to  the  problem  is  then  possible.  The 
total  system  must  be  able  to  respond  rapidly,  (within  one  week),  to  raw  data 
input  from  respondents.  It  must  therefore  be  on- located  with  the  system 
managers,  the  system  hardware  being  dedicated  to  occupational  data  analysis. 

Possible  Solutions 

14.  The  alternatives  available  to  ensure  the  continuation  of  an  occupational 
data  analysis  service  are  as  follows: 

Option  1  -  Continue  with  the  existing  system. 

Option  2  -  Implement  CODAP  80  on  the  existing  computer. 

Option  3  -  Implement  a  commercial  package,  (eg.  SPSS  ),  at  the  RAEC 
Centre. 

Option  4  -  Implement  USAF  CODAP  at  the  RAEC  Centre. 


-  Thcugn  tins  is  a  solution  cation,  tne  problems  outlined 
previously  preclude  it  frar.  consideration  m  all  cot  the  most  extreme  of 
circumstances.  Tr.e  service  available  to  the  Amy  's  Training  Organisation  would 
ue  siact  to  continual  degradation  leading  eventually  to  inefficient 
training  with  all  that  that  implies. 


A.- .  it  tier.  1  -  CCfA?  4C  was  designed  tc  overcome  tr.e  deficiencies  of  CODAP.  It 
is  written  m  FORTRAN  and  wall  documented.  There  have  however,  been  some 
snort falls  m  the  expected  program  run-time  target  efficiency.  Though  this  nay 
well  re  acceptable  on  a  dedicated  system,  tnis  is  not  the  case  with  our  host 
coopt  ter.  .Run.- times  measured  in  hours  or  even  days  would  consign  CCDAP  studies 
to  tr.cse  periods  least  _sed  by  the  operational  system.  Thte  outcore  of  this 
•would  re  an  even  longer  t_rr.  around  time  for  processed  data.  It  is  possible 
tnat  the  RAPC  OC  would  consider  tme  task  too  resource  demanding  and  not 
continue  tc  r.ost  COCAP.  A  record  and  perhaps  more  important  point  is  the 
prevision  of  interactive  facilities  to  ASTS.  Users  of  COCAP  80  are  able  to 
mterrocate  the  data  rase  cr.-lme.  The  analyst  is  able  therefore,  to 
manipulate  the  data  and  run  the  requisite  program  that  best  meets  his  needs 
frar  a  remote  terminal.  Tc  acnaeve  this  a  ccrmuni cations  link  would  need  to  be 


itaxlisr.ee  oetween  ASTS  and  PARC  CC,  (a  distance  of  seme 
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would  be  a  British  Telecar  leased  line,  (telephone  line),  and  would  require 
IBM  carpatibie  terminals  at  cacn  erd  tc  ensure  correct  ccrrminications  protocol. 
At  turns  time  however,  the  RAPC  CC  insist  on  the  terminal  being  able  only  to 
access  data  and  not  to  manipulate  it.  The  reason  for  this  is  tlie  security  of 
tr.e  RAPC  CC  computet  system  may  be  compromised  and  unauthorised  access  to 
ether  programs  possible.  CODAP  80  is  essentially  an  interactive  medium  ard 
witr.out  the  ability  tc  sort  data  ard  ran  programs  frar.  a  local  terminal ,  much 
of  its  power  -would  be  lost.  Similarly  CODAP  80  can  accept  data  fran  OMR  data 
•'■put  devices,  ir.  the  past  RAIC  CC  rave  not  ; cm tted  their  use. 


17. Option  3  -  Several  ccmerciai  trackages  -were  investigated  and  a  general 

coiment  on  their  suitability  was  tnat  none  were  specifically  designed  for 
occupational  data  analysis  m  a  military  environment.  Also  as  far  as  can 
ascertained  from  the  literature  they  are  not  able  to  produce  a  job 
description.  Camiercial  packages  cure  quite  expensive  ,  typical  costs  over  a 
five  year  period  are  £47750,  i,  costs  for  SPSS  ). 

18.  Option  4  -  This  option  is  based  on  the  most  recent  implementation  of  the 
USAF  CODAP  running  on  a  SPERRY  UNIVAC  1100/81  mainframe  computer.  A 
preliminary  investigation  into  this  system  is  encouraging,  it  would  appear 
that  those  shortfalls  of  the  current  CODAP  system  have  bean  overcome.  A 
representative  of  . -£TS  will  shortly  vis.it  the  USAF  HRL  at  Brooks  AFB  to 
confirm  the  suitability  of  it  to  meet  the  requirements  of  the  Bncish  Army.  An 
vq  lumen  ration  of  tins  version  of  CODAP  -would  be  hosted  on  a  SPERRY  Series  11 
computer  ruining  under  the  SPERPY  11  t  uperatmg  System  and  located  at  ASTS. 

Propx/sed  System 

15. The  proposed  system  snould  be  cased  on  (.Option  4  and  should  consist  of  USAF 
CODAP,  (if  found  to  Ig  suitable),  running  on  a  SPERRY  System  11  computer  with 
iu Lara ted  data  input  peripherals,  OMR ,  and  appropraa'  ■  output  devices,  VDUs 
and  printers.  This  system  to  be  located  3t  ASTS  and  ..anaged  by  the  Systems 
Group  of  tr.e  RAEC  ard  be  dedicate-u  to  occupjational  data  analysis.  The 
departure  lion  the  systems  analysis  routine  m  specifying  hardware  without  a 
design  phase  is  justified  by  the  fact  that  '70DAP  was  written  for  this  series 
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of  computers  a r.c  exploits  nuir.  of  its  hard  wired  power.  Any  re-configur3t.jon 
tc  on i I--,  ctr.er  exit:  uters  wculu  result  ir.  an  expensive  and  lengthy  process 
with  r.  u.-urt lev  c:  or  ►rational  efficiencry  at  the  end,  thus  narrowing  the 
fielc  cf  cncioe.  Tr«  cidt  question  nas  teen  tnoroughly  investigated  and  a 
suitatlcr  recrur.t  ice..ti:iea.  Tms  is  a  KAISER  CMR  80  which  has  the  ability  to 
read  net:,  si  mu  cf  a  c  ,xr  r.tr.t  cr.  eacn  pass  and  identify  those  documents  that 
contain  -u-mrs  ar.u  segregate  tner.  Tne  OiR  60  car:  read  up  tc  10,000  documents 
ter  heur . 


2C.The  disadvantage  c:  tuns  rroiosal  is  its  nigh  initial  cost,  m  the  region 
cf  £in,v,  T..e  advantages  however'  would  more  tnan  compensate  for  this  in 
specific  oust  savings  cf  training  ture  and  ir.  the  establishment  of  an 
effective  ana  efficient  training  system.  Tms  system  would  be  able  to  take 
advantage  c:  ttu\  develmre-nt  ty  the  USAF  HRL  witn  tne  improvements  to  the 
service  tret  mat  a: tails.  Alsc  the  Series  11  computer  would  possess  some 
spare  capavity  for  otner  tasxs,  eg  office  rranagerent  systems,  library  and 
course  a  dr  mis  trail  on  wr.ic.h  will  add  to  trie  general  efficiency  of  this 
estab'.  is. mi  ■  t . 


Justifica: 


clear 


u..i r.  centre.  syst-.:t-  tn-xr  it  is  a  fact  that  a  system,  without  a  clear 
purpose  ks  suitable  controlling  feed-back  will  eventually  degenerate  to 
c:ar".  T:  •  A  r  _  Sy.tr  is  cased  cr.  a  scientific  methodology,  SAT, 

mat  require?  accurate  training  cryectives  derived  from  comprehensive  job 
aescri:  tie  ns .  i  wi-r-t  ;n  tin.  iem  of  internal  and  external  validation 
provides  the  control  arc  crsuies  the  training  objectives  are  valid  such  that 
trair.it.  cmwt  f  v  sc  Id  ter  for  fee  task.  The  Current  implementation  of 
CODA!  nas  snort  comings  ir.  tne  capture  and  presentation  of  data.  There  is  seme 
doubt  as  tc  ti.--  wiliingnt^w  <>:  BAPC  CC  tc  continue  to  host  CODAP.  The  time  is 
ngnt  t:  rraK-  a  cod  tne  untied  onto  tier:  problems  with  a  modem  version  based  on 
a  Cud.  cat -a;  cornu  mi  under  trie  direct  control  cf  ASTb. 


c.rire  :t 


:  er  therefore,  tha  l 


a. hit:  ti.c  i  l ovi so  above,  die  USAF  version  of  CODAP  be  adopted 
ft:  ry  ln>.  British  Army . 

:  sxtarb  pucu^  of  hare  ware  be  made  tiiat  is  able  to  run  CODAP, 

t o  a  a!  Ekt i  System  11  computer,  peripherals  and  a  KAISER  OMR  80. 

c.A  O  DAP  consultancy  and  management  cell  be  established  at  AST3  to 
cor tr'  I ,  advis-  and  manage  the  occupational  data  analysis 
rf-qi:  runout?  ci  tne  Amy  with  a  possible  extension  to  the  Royal  Navy 
ana  trie  Royal  Air  I-oroe. 
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1 .  I nt  roduc  t i on 

I  be  principal  occupat  ion.a  1  analysis  technology  in  the  Air  Force  is 
tb>  c  ortprt  hens  i  vt-  Occupational  Data  Analysis  Programs  (COuAP)  software 
'-'.sti",  v.tucii  has  supported  a  major  occupational  research  program,  wittiin  the 
Air  Force  Human  Resources  Laboratory  (AFHRLl  and  a  major  operational 

occupational  analysis  program  within  the  Air  Force  Occupational  Measurement 
(enttr  tlSAFt'M.)  since  1  a  6  7 .  From  a  software  standpoint,  CODAP  can  be 

defined  as  a  package  of  computer  programs  used  to  input,  process,  organize, 
and  report  occupational  data  from  job  inventories.  From  a  measurement 
-tandpoint,  CODAP  can  E._-  H.  fined  as  a  set  of  procedures  wh ’ .  h  focus  on  tlie 
analysis  and  comparison  ol  individual  and  group  job  descriptions  and  their 
associated  biographical  data,  as  well  as  on  individual  tasks  and  groups  of 
tnsi-c  (task  nodules)  and  their  associated  characteristics.  From  an 

aj pi l cat  ions  standpoint,  CODAP,  can  be  defined  in  terms  of  its  significant 
contributions  to  the  mission  accompl lsh^ent  of  the  Air  Force  manpower, 
personnel,  and  training  management  (KPT)  functions.  These  contributions 
include:  providing  a  data-based  approach  to  evaluating  and  updating  Air 

Foret  officer  and  enlisted  classification  structures,  providing  an  empirical 
means  of  restructuring  and  redesigning  jobs,  providing  data  that  has  been 
instrumental  in  eliminating  unnecessary  training  and  in  pinpointing  specific 
training  requ i r< ment s ,  and  providing  a  scientifically  sound  basis  for 

rt.-'liininj  entry-level  aptitude  requ.  '’tents  across  career  fielc.s. 

2.  Problem 

Toe  ( t'DAP  system  began  20  years  ago  as  a  software  package  of 
approximately  IB  gene ra 1-puruose  programs.  However,  in  order  to  keep  pace 
with  the  rapidlv  expanding  needs  ot  occupational  researchers  and  analysts 
since  that  tur»  ,  the  system  was  forced  to  expand  unsystematically  into  a 
worm  what  contusing  aggregate  of  more  than  60  general-purpose  programs.  Over 
time,  the  system  became  increasingly  difficult  to  maintain,  modify,  or 
mgr  ent  without  extensive  progr. inner  trainin?  and  expern  nc  >.  Many  of  the 
programs  Imd  In  en  hastily  developed  to  meet  short  suspenses,  with  little 
time  aval’ll!  1.  for  flouncing  proper  program  dot  umenlat  i  en ;  and  many  of  these 
programs  were  partially  rtdundant  with  already  existing  programs.  Files 
i’M  l  nt  ,u  m  A  in  tin  svsttn  hau  been  fait  in  a  variety  of  formats  and  stored  m 
a  varietv  ot  midia  b\  a  Miecessic'n  ot  programmers  using  a  variety  ot 
j-rogiamm.n,  sturdirds  and  styles.  L  *  and  advanced  developments  that  had 
.  dt  d  u,  t:i>  -vsten.  tor  '  or.  e  tin.  leepiired  th.it  m.iPv  ot  tlie  existing 
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programs  run  more  efficiently  and  interlace  more  easny. 

The  (.01  AP  system  had  not  onlv  grown  i ^  size  and  complexity  since  its 
inception,  Du t  th..  evolutionary  process  had  become  increasingly  mot  dynamic 
as  a  direct  result  of  significant  hardware  improvements  to  the  AH1RL  Sperry 
s vs t er. ,  »~a  jor  enhancements  to  the  basic  structure  of  the  CODAP  software 

system,  a  growing  sophistication  in  the  methods  and  procedures  required  by 
Air  F<>*v»-  occupational  researchers  and  analysts,  and  a  wide:  range  of  users 
and  appl icat ions ,  Consequently,  it  became  increasingly  urgent  chat  a  mujc- 
system  redesign  el  fort  be  launched  to  consolidate  and  use  to  better 
advantage  the  many  additions  and  modifications  that  had  been  incorporated 
into  CODAP  over  the  years  and  to  create  new  technology  and  software  to  meet 
the  current  and  anticipated  ir.etl-ouological  and  applications 
requirements  ot  tCDAP  users. 

E.  OPJF.C  I  1 VES 

The  objectives  of  the  uODAP  redesign  project  were  to  improve  Air  Force 
occupational  analysis  methodology,  translate  methodological  improvements 
into  R&D  technological  capabilities,  standardize  occupational  analysis 
procedures,  and  generate  efficiencies  in  the  occupational  analysis  process 
that  would  significantly  improve  quality  of  product  with  substantially  less 
expenditure  of  scientist/analyst  man-hours  and  computer  processing  time. 
Increased  data  processing  efficiency  would  also  make  feasible  the  develop¬ 
ment  of  complex  analysis  programs,  especially  in  the  area  of  automated  job 
unal\sis,  that  would  otherwise  have  been  too  costly  to  run. 

L  .  IT  A NS 

Early  in  1983,  AFHR!  negotiated  a  task  ordering  contract  with  an  8A 
(srill  business,  minority-owned)  contractor,  the  MAXIMA  Corporation.  The 
first  task  submitted  to  MAXIMA  was  a  26-month  project  (later  extended  to  29 
months)  to  rewrite/convert  the  CODAP  system,  as  needed,  to  bring  it  in  line 
with  the  most  recent  standards  tor  software  development,  some  of  which 
n  presented  requirements  established  by  the  aFHRI  Technical  Services 
Division  (TS).  The  initial  meeting  with  the  contractor  identified  the 
following  as  the  most  urgent  needs  for  revision  of  the  CODAP  system: 

1.  Converting  the  general  -purpose  CODAP  programs  from  FIELDA1A  FORTRAN, 
which  was  no  longer  being,  supported  by  TS ,  to  the  FORTRAN  77  Standard  (ASCII 
1-ORIFANt.  "ASCII"  is  a.  acronym  for  the  American  Standard  Code  for 
J_n  t  orma  t  i  on  _I  n  t  e  r  eh  a  ng  e  . 

2.  Converting  the  utility  programs  from  the  old,  ’I  S-deve  1  oped  PILOT 
language,  winch  was  no  burger  b-  ing  support*  d  by  Sperrv,  to  the  new, 

I  S-deve loped  PR1SM  language. 

3.  Converting  the  system  tiles  from  a  variety  of  formats  to  stendard- 
:nd  mass  storage  tiles,  wiier<>vei  tifsible. 

4.  I  We  lopnif  simplified  processing  procedures  (runstream  generators) 

lor  otten-used  . .  strings. 

A,  Improving  formats  o!  pi  intoci  reports  to  enhance  readability  and 
l nt er pre  t  ah 1 1  it  > - 

A.  Conducting  .  xpl or atorv  development  ot  new  analytic  cap  ib i L i t  i es , 
-specially  in  the  arei.,  of  protil-  analysis,  n.-nh  i  erarch  i  c  a  1  clustering, 
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:we-wa\  clu-'tering  (cases  x  tasks),  module  technology 


automated 


At  till-  meeting,  al-o,  major  new  Air  Forte  manpower,  personnel,  and 
training  <MPI)  programs  which  would  require  the  extensive  use  of 
Oscupat lona  1  survey  data  were  discussed  in  terms  of  how  these  projects  might 
impact  tin  planning  of  the  CCDAP  udesign.  Fxamples  of  these  emerging 

application-  lrolud-d  such  research,  and,  devi  lopment  (R&D)  projects  as:  the 

"rain’ng  Decisions  System  (IDS),  the  Basic  Cognitive  ShiLls  Project,  the 
\dvanced  On-the-Job  Training  System  (ACTS),  th  Task  Qualification 

Assessment  iTQA)  Project,  the  Task  Identification  and  Evaluation  System 
tllFS),  and  various  Performance  Measurement  projects.  In  addition  to  major 
programs,  the  new  U'TAP  system  would  be  required  to  support  new  operational 
appKcati  ns  for  Air  Force  managers  and  decision  makers.  An  example  of  such 

an  application  would  be  the  use  of  CC  DAP--buxed  occupational  survey  data  to 

'  Ip  es  t  a'- 1  i test  outline  "testing  importance"  specifications  for  the 
..evelopmt  nt  >•;  .  nli-ted  promotion  tests. 


identified: 


a  b  1 1 1 ! 


As  a  consequence  of  this  meeting,  three  r.ajor  project  objectives  were 
identified:  increased  operational  efficiency,  inproved  systc  maintain¬ 

ability,  and  expanded  analytic  capability.  Each  major  project  abjective 
wo.-,  in  turn,  further  broken  down  into  technical  goals,  which  will  be 
presented  and  discussed  in  the  next  section,  no  longer  as  goals,  but  as 
redesign  as  sumpl  l  sliments  .  inese  aceon  pi  ishments  show  how  the  redesigned 
(Ol’AP  system  (ASCII  COCAP)  differs  from  and  is  an  improvement  upon  the  old 
CODAP  svstem  (MhLDATA  rf'DAP) . 
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As  compand  to  the  old  F  T  E I  DAT  A  COCAP  system,  the  new  ASCII  CODAP  system 
has  achieved  the  three  major  project  objectives  listed  in  the  previous 
section,  as  follows: 


In  ASC 1  I  CODAi  ,  Operational  Kf i i < 


Has  Peen  Increased. 


Run  setups  have  been  simpli f ied . 

Functional  program  names  used 

Mnemonics  employed  to  name  contml  card  options 
Control  cards  standardized 

Program  execution  by  processor  control  cards 

nuredM-d  flexibility  in  filename  specification 
elimination  of  separate  filename  cards 

Audit  trails  have  been  improved.  Tin  first  page  of  every  report 
contains  information  such  as  the  re  pet  ID,  when  the  report  was 
created,  tin  IDs  of  all  input  files,  options  selected,  and 
applicable  selection  or  cutoff  parameters  for  cases  and/or  tasks. 

Resource  utilization  has  been  improved. 

Reduced  core  requ l renu  n t s  lor  many  program® 

Kidiued  running  t  imes  tor  many  ptoprams  (<  .g..  ion  description 
pr  >gr.:m  tuns  lb  tuns  luster’ 

1  1  in i na t l on  ul  three -reel  tiles 


7  nrnarou nd  liar  bent:  speeded  up  . 

Reduced  core  requirements  nnj  running  limes  or  many  frequent  1.- 
run,  "'u  1 1  x  p  le-execut  <  on  prcgrar  s  ,  will  as  the  elimination  ot 
throe-re. -1  files,  has  permitted  daytime  running  of  these  programs. 
Conversion  of  files  to  ma«s  storage  format  has  speeded  u- 
processing  an.i  allows  ^orre  programs  which  were  previously  run 
only  .r.  hatch  mode  to  be  ran  in  dem.anc.  mode. 

Fewer  ru":  are  required  to  accompli®!!  a  standard  analysis . 

Flimi’ution  oi  redundant  programs 

Combining  of  programs  which  perform  related  function® 

Dev  e  1  opr.  en  t  of  f  unc  t  ic  .in  1 1  v  pure  programs  (e.g.,  S3irple  selection 
extracted  from  multiple  programs  and  cuisolidated  into  a  single 
program) 

7  ra  min-:  requirements  for  computet  technic  ians  have  beer  simplified  , 
Development  of  comput er-t used  training  package  (in  progress) 

Better  documentation 

Peduced  number  ot  programs 

Functionally  pure  programs 

Mnemonic  control  card  options 

Visible  cue  report  format  spec  I f icat ions 

?■ anda ? d i zt d  control  cards 

Automatic  process  generation 

Separ  te  report  file  process 

ASCI  I  CODAP,  system  maintainability  has  been  improved . 

Stiuctured  ’~rograir.ir.ing  used. 

Conversion  FORTRAN  77  Standard  (ASCII  FORTRAN)  and  PRIS*  . 

Fewer  programs  (reduced  from  120  to  837. 

Functionally  discrete  programs. 

Reduced  a mo u nt  of  source  coJe  . 

Assembly  language  code  reduced 

'lotal  lines  of  code  reduced  (ASCII:  27,186  vs.  FIE1DATA:  48,477) 
elimination  of  redundant  code 

interfacing  with  non-CODAP  software  lather  than  retaining 
sin.ilai  CODAP  programs 

I  m  rea®*.  <1  i  n  U  i  na  1  documentation  (47.  more  lines  of  comments). 

Increased  supporting  documentation,  which  is  all  in  automated  form1 . 
Isers  manual 
Standards  documentation 
Pro* rammer  reterence  guide 
Subrout  in*  documentation 
Subprogram  documentation 
File  tenrat.  documental  ion 

System  has  been  s  t  andard  l  /■■<.! ,  w!.<  in  ver  t  e  a  <-  i  b  1  ■  ■ . 

Program  eod  _> 

Program  names 
Program  document  <it  l on 

File  l  on.  at : 

Subr-  ot  i  n  1  i  l  i  ->r\ 

Formal!/,  u  test  ant  ,u  >  e  ;  t  am  ■■  pi  o.  edui  es  t  o  ensuio  l  e  1  l  ,ib  i  1  i  t  \  . 
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In  AFC'  i  !  CODAS’ ,  ana  lyric  oapaKiI  itx  has  been  expanded, 
i.  Increased  system  limits. 


-  Fror.  1,700  to  3,000  task? 

-  Fron  l.ncO  to  9,939  task  modules 

-  Iron  l,',2fr  to  3,OOC  tasl'-  p  modulo 

-  Fron  90Q  to  2 , 0 ( ' f >  history  vaiiablas 

-  Fron.  MH1  to  9,999  computed  variables 


Unchanged  system  limits. 

-  20.000  cases  total 

7,000  cases  for  clustering 

-  26  duties 

-  66  characters  for  variablt 
descri pt ion 


K|W  features  and  capabilities. 

-  Clustering  technology  expanded 

additional  overlap  indices  incorpciateu  (four  new  indices) 
task  clustering  capability  added 
nonhi erarcii ica  1  clustering  capability  added 
pretile  analysis  capab’litv  added 

-  Module  technology  expanded 

tasks  can  be  summarized  into  modules 
modules  can  Do  used  like  tasks 
module  values  can  he  clustered 

-  module  data  col  lev  •'•-o  from  job  ipcumbcnts/expert.  raters  can  be 
processed  di'ectly 

-  Job  typing  t°  cunol  ,->gy  expanded 

pairwise  group  comparison  program  (AUT0JT)  capabilities  expanded 
core  task  analysis  program  (CORSET)  capabilities  expanded 
program  developed  which  makes  initial  selection  of  job  types 
automatically  (JCBSET) 

-  Interrater  reliability  technology  expanded 
Imp  roved  final  pro<!u>  ts  . 

-  More  readable  and  ipterpretable  report  formats 

-  Greater  availability  and  more  flexible  use  of  fi ?e  text 

-  Use  of  upper-  and  lcwer-caso  letters  and  overstriking 

-  Greater  stan tardizat ion  of  formatting  for  reporting  similar 
kinds  of  data 

External  <-\stem  interfaces  to  non-(0DAP  software. 


-  Data  distribution  tontines 

-  t  oi  re  he.  i  or  and  ratr,  smou  software 

-  Factor  anil, sis  sottwar> 

-  ‘other  statistical  packages,  sin  h  as 


BMDP  and  IMS'. 


in, 


1 SM  I  S  R1LA11L'  ,0  vtiDAP  1T.GHNOI  OGi  TRANSFER 


liansfer  el  n  i  r  recce  AS!  II  U('DAl' 


Air  Forte  R.  cu  1  a t  t mi  (AIR)  no-s  gon  rns  the  release  at  Air  Force-owned 
i  d e vi.  1 , ■  pe  !  coi’.puter  m  ltwM'i  |u.ka;u  .  It  state's  in  para  11-71(2): 
Computer  |  r>  ,  r  irs  and  r<  1.  it.it  t  cinic.J  ’eta  <u  e  not  considered  'records' 
■.thin  the  ( ungi ,  s '.  i ,  ,,:.i  1  iut>nt  oi  ;  i.F.C.  9-2;  teese  items  are  considered 
ruperty."  It  it. ether  ctati  ^  in  para  !l-7b'l):  "Software  packages  will  not 
i  released  ’o  the  private  setter  exc  pt  when  l'\  the  best  interest  of  the 
'vernn.nt."  ip  the  vent  tint  A  i  •  Eerce  ■  oliwan  ,  such,  as  ASCII  CODAI’,  is 
e  1  e  i  s  *  ■  d  t  e  ,  piiv.it'  '  -'(or  •,rm,  the  r,‘<|iit‘si  er  ^ust  <ertilv  that  the 
...  i,>,  will  n  >t  1"  puhlisli-ui  iut-  profit  oi  in  any  manner  ottered  tor  sale 


to  tin  c>'\i'ri'r.  nt  and  will  not  he  sold  or  giver,  to  any  other  activity  or 
lira'  wit*  out  the  prior  written  approval  of  the  Air  Force.  On  the  other 
ha"d ,  AFK  h.i'-t  permits  th.e  release  ol  ASCII  CC'DAP  to  the  public  sector, 

such  a?  .over-mint  jgtncies  at.c  un  i\  <-rs  1 1 1<  s ,  provided  that  the  recipient 

i  '.tatiri-nt  of  terras  and  conditions  which  frees  the  Air  Force  of  arv 
iiati  lit\  and  or  r  -.pons  1  b  1 1 1 1 y  for  the  sottw  iff  and  requires  the  recipient 

to  tate  t  r  *  lrtt-iutd  use  of  the  software. 

b  .  » .  - t  el  i or p-t i lie  hardware  Versus  Cost  oi  Software  conv e  r  s i o  n 

Sf.rrv  re;  r-  -Tt.it  ivos  vote  asked  to  speculate  as  to  what  would  be  the 
tie  ".ir\  v  -nt l . nr  at  ion  tor  a  Tini.adl  Sperry  svstorr.  to  meet  the  requirement 
Mat  thi-r.  ..  old  "e  '  to  In  users  on  a  system  dedicated  to  running  CODAP. 
It  w.  -  irr  .  e.  that  a  "immally  .  -eqaite  syst.r.  would  consist  of  a  Sperry 
S-  stem  11  or  Sp-rr.-  11  '  or  yr,  rry  1100/61  wit1'  a  Mega  Uord  of  main 

rvi-.m  ,  an  t-.~0  -i-k  -yst.r  (and  controller)  containing  50K  tracks  (89 

r  i  1 1 1  >r  word-),  t^r*  •  Ipi’o  BP1  tape  drives  w :  •  h  tape  controller,  console, 
ipper-Iewc r  ;a-e  printer,  and  a  c  oraruni  cat  ions  subsystem.  Such  a  system 

w.'uid  co-t  approv  lmatel  \  $.'"0  ,000.  ASCII  CODAP  could  be  transferred  to  such, 
a  s y w 1 1  p  at  littl*  ,."'t  end  be  up  and  running  almost  immediately.  On  the 
other  ha-d,  tin  c-atractors  who  produced  ASCII  CCDAP  estimated  that  it  may 
-os  r  .as  -.not  is  $w'0,0t  <'  3nd  require  one  year  for  two  senior  systems 
anal  vs ts  and  two  i rogr  amrs  rs  conversant  with  oODAP  and  familiar  with  both 

the  Sperry  sv-t- --  and  the  ncn-Sperry  host  system  to  accomplish  the 
c.vvt  tm  m  of  ASCII  t  (TAP  for  running  on  trn-  non-Sperry  system.  Given  these 
t..o  alt-rn.'t  iv  *  ,  th*  cl'-'io  of  which  way  to  go  should  be  an  easy  one. 

1  •  vert  he  less ,  j’.-tnui'p  the  purchase  ot  another  computer  system  when  your 
or,-  ->n '  /.,a  t  i  on  air- ad.-  has  one  is  usually  more  difficult  than  getting 
author izat  ion  hind-  to  do  a  software  conversion. 

IV.  (CM  IPS  i  OK 

I  he  n.-v  Air  Dorn  \£(  I  I  (C-DAP  system  is  now  undergoing  final  test  and 
acceptance  !>v  the  ali.FI  l.d  nical  Services  Division  (AF11RL/IS) .  The  process 
is  rnvinc  along  sroothi-.  and  the  new  svstem  should  be  ready  for  release  to 
the  :  SAf  ( vc  a;  <.t  .  ■  •  .i  1  v>  asuremi  nt  (  nter  (ISAFOMC)  and  other  Sperry/CODAP 

i  s,  r s  !  datuia t  -  1  “c -  . 

Rt  H  Ki.NCfS 

-ft.  i  r'i-n.  <ivv(il  11  Iel\  ).  Automatic  Data  Ft  ocess  i  iig  Resource  (ADPR) 

Managi  men t  . 

in. lien,  V.J.,  h- i  1  1 ,  i  ,  J.J.,  L  Stalev,  K.k.  (  1983,  May).  Advanced 

Mb'.  A1  :  hew  an  1 1  y  .  i  .  a  pah  i  1  1 1 1  e  s .  Fitch  International  Occupational 

Analysts  Wort  -do;  bin  Antonio,  . . 

Staley,  M.R.,  i  -.11.  r,  b  Ph«,l.n,  h’.J.  <19h5,  May).  ASCII  CODAP: 

!h-  i  ,  -  p  i .  t  -!  _  1  ■  r  d'S-t,i  tor  .-.org-ng  applications.  Fifth 

hit-  n.it  i  a-al  i  .  <  j  itional  Analvsts  U'urksh.-r. ,  San  Antonio,  IX. 

ivisMTl!,!,  l.j.,  S  •  •  1  ■  ' ,  K'.t  Pi. at. n,  K.J.  (i98b;  Mav).  ASCII  i  ODAP : 

Oearterlv  -tutu-  r-port .  i- 1  t  !  h  International  Occupational  Analysts 

V.o  r  r  s  t  . .  i  ,  /in  ,  IX. 
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IN  TROD  1X1  ION 


The  Occupational  Resea-ch  Data  Bank  (ORDB)  is  a  computer-based  occupa¬ 
tional  information  system.  It  provides  researchers  and  managers  a  means  of 
rapid  on-line  retrieval  of  a  variety  of  current  and  historical  occupational 
information  on  Air  Force  enlisted  specialties  and  the  people  performing  duty 
in  them  through  a  set  of  user-friendly,  tutorial  programs.  Use  of  ORDB 
stream1 ines  background  research,  permits  quick  in-depth  orientation  to 
specialties,  and  provides  a  database  for  a  rapid  response  capability  for 
iesearch  and  management  concerns.  Tr.e  ORDB  is  of  great  value  as  a  tool  to 
increase  productivity  and  as  an  instrument  for  longitudinal  and  cross¬ 
specialty  ana’yses. 

Development  of  the  ORDB  oegan  in  1978  with  the  investigation  of  avail¬ 
able  data  that  would  contribute  to  Air  Force  occupational  analysis  and 
management.  A  number  of  sources  and  types  of  information  were  identified 
and  have  been  obtained  for  inclusion  in  the  ORDB  (Carpenter,  Archer,  &  Camp, 
1979;  Stephenson,  1979;  Camp,  1982).  The  primary  contractor  for  this  effort 
is  the  General  Services  Administration  data  processing  services  contractor, 
currently  0A0  Corporation.  0A0  personnel  assigned  to  this  project  are  col¬ 
located  with  the  monitoring  activity  (AFHRL/MOMM)  at  Brooks  AFB,  Texas. 

ihe  ORDB  is  composed  of  both  digital  and  hard  copy  data  consisting  of 
technical  reports  and  studies.  Air  Force  Regulation  39-1  information  by 
career  area,  statistical  variables  summarized  for  occupations  from  individ¬ 
ual  Air  Force  meouers  and  technical  training  course  data,  and  Compiehens ive 
Occupational  Data  Analysis  Programs  (CODAP)  (Christal,  1974)  studies  ier- 
formea  at  the  Air  Force  Occupational  Measurement  Center  (OMC) .  These  types 
of  information  have  been  obtained  and  are  incorporated  in  the  ORDB.  The 
subsystems  which  provide  for  storage  and  on-line  retrieval  of  the  informa¬ 
tion  are  described  in  the  following  section.  Before  going  into  the  ORDB 
subsystems,  here  are  two  notes  of  interest: 

I.  When  referencing  an  Air  force  Spe<  ialty  Code  (AFSC),  there  are  three 
love's  <  f  d<  tail  that  can  be  shown  using  ihe  five  numbers  of  the  AFSC.  Here 
are  ihe  types,  abbreviations,  and  examples  oi  these  thr- e  levels. 


<,20 


a 


Skill  Level  (S)  e.g.,  2  7U  30 ,  423300  (skill  levels  1,  3,  5,  7,  9,  4,  0) 
Ladder  UJ  e.g.,  V 7 3X2 .  113X01 

Career  U)  e.g.,  bl.XXX,  22XXX 


2.  References  to  the  figures  at  the  end  of  this  paper  are  actual  slides 
from  tie  elnbcrative  case  study  scenario  of  the  -,26X2  (Jet  tngine  Mechanic'1 
1  adder  .AFSc,  which  was  the  major  part  cf  the  briefing  illustrating  hoc  ar. 
occupational  researcher  could  hen,  lit  from  using  the  ORDB.  Figures  1  and  2 
,.re  actual  output  from  tne  ORCr. 


BAt-TFM  CVLRVlhW 

!  lie  CFDB  operates  on  the  AFHRL  Sperrv  llUC/81.  Seven  subsystems  are 
tailored  to  tt  ■  tvp.es  of  data  and  kir.es  of  retrieval  needed  by  the  user. 
V'es,  subs\ stems  are  linked  together  by  a  iront-end  program  to  sitr.plily  toe 
use  of  the  ORDE.  The  programs  are  designed  to  interact  with  the  i>n  , 
assisting  in  the  choice  cf  the  appropriate  subsystem,  and  in  selecting  tin 
desired  information.  Each  subsvsterr  is  described  below. 


1.  computer  Assisted  Refercrce  locator  (CARL).  The  CARL  subsystem  is 
used  to  reterence  hard  copv  occupational  data  items,  such  as  recurring 
reports,  occupational  survey  reports,  job  inventories,  films,  and  micro¬ 
fiche,  which  are  stored  at  the  Air  Force  Human  Resources  Laboratory,  Brooks 
AFE,  Texas.  References  are  based  on  user  selected  keywords.  CARL  was 
obtained  from  the  Navy  Personnel  Research  and  Development  Center  (NPRDC)  and 
modifieJ  to  operate  on  the  1100/81  (Sands,  1978;  Sands  &  Hartman,  1979), 
Additional  modi f icat ions  were  maae  to  accept  AFSCs  as  keywords  and  to  clarity 
user  selection  of  output  options. 

lach  r >terence  stored  in  the  CARL  subsystem  includes  such  information  as 
autt.'r,  name  or  title  of  the  reference,  type  of  reference,  a  brief  narrative 
description,  and  an  associated  list  of  keywords  for  each  reference  (see 
Fi pure  1 1 . 

To  speed  the  referencing  process,  two  scarih  techniques  have  been  added 
to  CAR! — Quick  and  Smart.  Quick  is  a  binary  search  on  a  given  list  of  key¬ 
words  available  upon  request,  while  Smart  is  a  character  string  search  across 
tne  list  of  available  keywords.  Both  respond  with  the  number  of  references 
hcet-d  rrd  ask  if  and  how  the  user  would  like  to  see  output.  In  addition, 
tne  user  c  an  expand  or  redu:e  the  number  of  reterences  tv  using  additional 
keywords  or  character  strings. 

2.  Aptitude  Requirements  Component  (ARC;.  The  ARC  subsystem  contains 
Ah  SC  descriptions  (for  ladder  and  career  field),  progression  ladders,  and 
I  . erequ l s i t  ••  data  for  the  years  lu78  to  the  present  (  1985).  The  ARC  has  an 
A?F(  number  change  history  file  which  tracks  all  changes  from  March  19c5 
through  the  present.  In  addition,  aptitude  requirements  information  for 
each  AFSc  is  stored  and  accessible  (Figur<  2).  Also,  the  ARC  subsystem 
contains  KIXK’N  ( „  contractor)  and  OM1'  study  information  including.  Average 
Task  Difficulty  IVr  I'mt  Time  Spirt  (ATDPUTS),  val  id  i  t  y /re !  i  ah  1 1  i  t  y  informa¬ 
tion,  statistics,  and  m i n i mum/max l mum  task  information.  It  should  be  noted 
that  the  KINK”,  reports  (approximately  2n()>  were  product  d  in  the  Aptitude 
Requ  l  run  >  n  t  ’  h<  hrr.arkine  research,  while  0N(  studies  are  eonpltted  with 
each  oci  up.it  i  onal  surv*  <  of  an  AFSC  (Kintt.n  intw  a  23-point  scale  while  OMt 
uses  a  9-poi nt  scale  ior  AiLH'IS). 


bii'i-  the  c  Pl'r  i s  a  menu-orient*  d ,  tutorial  system,  access  ana  use  cf 
this  and  tr.e  :ol lowing  sutsy  stems  are  quite  easy  ar.d  etficient.  You  only 
reallv  need  to  have  an  ices  about  the  AFSC  structures  in  the  Air  Force. 
Tr.is  structuring  is  readily  apparent  m  the  use  of  the  statistics  subsysteir.. 

3.  Statistical  Variable.  The  statistics  subsysteir  contains  demographic, 
aptitude,  education,  training,  turnover,  and  duty  related  information  on  Air 
K-rce  enlisted  personnel.  Information  is  sorted  by  AFSC,  population  group, 
ard  -.ear  or  a  total  of  133  different  variables  for  the  most  current  3  years 
of  data.  Population  group  is  based  or.  enlistment  status  by  lotal  Active 
Federal  Militsrv  Service  (TAFNS)  (0-'*  years,  3-8  years,  8+  years,  total 
sample,  or  current  year's  accessions).  See  Figure  3  for  a  partial  list  of 
variables  and  their  level  oi  der'iil  U. ,  t  ,  or  S).  An  AFSC  must  be  valid  tor 
ti  v  t v p e  of  detail  available  anu  it  must  have  existed  at  the  end  of  the  year 
f  'r  w^ich  data  are  re  mg  requested  (validity  of  an  AFSC  can  Ke  checked  in 
the  ARC  subsystem,  and  tne  validity  of  detail  can  be  checked  on  the  menu 
u.ailable  in  the  statistics  subsystem.'. 

While  3  vear.  of  data  are  stored  on-line,  earlier  years  of  statistical 
data  vi  1  1  be  accessible  via  batch  run.  ihe  sources  are  the  Uniform  Airman 
Record  (VAR),  Pipeline  Management  System  (PMS),  Airmen  Gain  and  Loss  (AGO, 
an Processing’  and  Class  i  f  icat  ion  of  Enlistees  (PACE)  files  stored  at  AFHRL. 

This  subsystem  uses  System  2 f, 0 « )  (S2K)  Data  Base  Management  System  with 
CCFCI  extension  (Intel  torporation,  1382). 

- ,  CCDAl  Report  Display.  This  subsystem  was  developed  to  provide  the 
trtsk  scientist  and  manager  with  the  ability  to  rapidly  retrieve  OMC  and 
AHIPI  CODAP  reports  and  review  them  on  the  terminal  screen.  To  accommodate 
the  standard  CODAP  report  format,  Datagrapnix  132  character  remote  termi'als 
are  ir  use  at  principal  user  sites.  Studies  can  be  selected  by  either  AFSC 
’r  study  number  (from  a  menu  of  available  studies).  Studies  from  1978  to 
the  present  have  been  loaded,  and  any  report  retrieved  on  the  screen  can 
also  be  printed  at  the  user's  option.  This  subsystem  is  programmed  in  the 
Programming  Instructions  for  String  Manipulation  (PRISM)  language  (AFHRL/TS, 
198.)  . 

iror  the  initial  listing  ot  lODAP  reports  contained  in  the  OMt  studies, 
a  d.  t  err  mat  i<  n  is  made  as  to  which  CODAP  reports  will  be  loaded  into  the 
t'PDR  (Figure  ) .  The  determination  is  based  on  tne  requirement  of  a  report 
that  concerns  a  population  group  and  an  AFSC  in  a  similar  manner  to  the 
statistics  •,ubs\st*'m  organization. 


.\hen  itre  user  selects  this  suh-ystom  iror  tie  ORDB  introductory  screen, 
!’  t  or  «h  is  provided  «  mice  ot  the  live  different  CODAP  retrieval 
1  e  ,i!  ur>  s  . 

(.'TAP  report  d  i  sp ' ay -- ! he  us.  r  r-:n  view  tb.o  text  ot  individual 
i-;orts  npd  obtain  a  bard  .  op  y  of  d-  sired  r-  ports. 


h.  >  (  [ Al  idlt  feature — 'he 
1  i  n-  s  iron  ,  re  oi  .  o  -  r.  ports  !  s  om 
oust  nr  i /.  el  hard  ,  opy  output. 


us,-r 


i  .ui  use  tu’ting  comma  ids  t  >  select 
studies  and  sort  them  into  a 


or  i o i o 


'<■  V*  ffjll'.  ‘,  '.’>  5  *j["' 
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c.  Ias--levcl  crcss-study--lh«  s;  stem  will  ntrieve  tasks  contain¬ 
ing  kt  .  uor„:>  from  mult.;!,-  >tuci>'s  an:  print  hardcopy,  11  the  user  uosires. 

backttoui.d  cr  's'-  st  .iy  ur  aly  si  s---Th-:  system.  will  retrieve  IS 
variables  :  roir.  tipi--  st’.Ju-s  ar.c  print  hard  copy  if  the  user 

uec-s  :u’.t'-  Irainir.c  Standard  tSTb)  i  terns — Ire  system 

w  :  I .  r-  or  .■  ::  <•  cor-to.Ini-v  Si."  :  t  e- for  an  AFSC  in  th.e  lonrat  of  a  title 

s.  t.  ^  f  . 

•  1 1  oss-St  i.-y a  1  y  s  i  s  .  It..'  subsystem  w-s  developed  m  response  to 

t;  •  re-'d  to  c  'c  o.ire  c<’DAP  r«port:-  across  specialties.  Since  to  DAP  variable 
in.:  hers  and  title  s  are  not  necessarily  sturdird,  identitying  corresponding 
cat  i  in  cit’.er.nt  sticie«  pr*  sent  s  a  litic-ilt  task.  To  sol”e  this  problem, 
studies  are  indexed  a>>  trey  are  loaded  to  the  CRDB  for  a  set  15  variables 
and  o  rr'i.ru  ibe  variables  mc'ude:  NunOer  of  iasks,  olL'PCTS,  JoP 

P 1 1  r  l  c  ■  lt\  Iroex,  trade,  Ma'or  Command,  i  ime  in  Career  Field  (TICF),  TAFMS, 
rli/mlv  to  Re. mist,  rlifiole  for  Retir.ment,  Jot  Interest,  Talent 
Ft  i  1 : c at i on ,  draining  c t  i !  i  zat ion.  Sense  of  Accomplishment,  Plan  to 
Pee:, list,  and  How  Assigned  to  Present  Career  Fie  1-1 

Croups  that  ,,in  b.  sralvgec:  include:  Total  Sample.  Si-' 1 1 1  Levels  3.  5, 
7,  u,  1-sp  months  IAFMS  >r  ilcF,  -9-?6  Months  TAFT'S  or  TICF,  and  97+  Months 
1AFMS  or  Mir.  Cn-lin«  retrieval  ot  corresponding  date  from  multiple  studies 
on  ore  or  a  number  'f  j-  b  groups  car.  be  tor  formed  using  tnis  system  >' Figure 
'I.  For  example,  job  difficulty  of  airmen  with  l-4fe  months  '1AFKS  car.  be 
retrieved  for  comparison  across  any  number  of  AFSCs.  This  subsystem  i? 
written  in  IMSV  and  uses  tie  same  data  files  as  the  CODAP  Report  Display 
s ■  s p " v  -it*  m . 


+ .  b  t  a  1 1 "  t  u  a  1  _y  ackage  for  the  Social _ Sc  iences  ( SPSS  )/St  at  ist  ica'l 

Inter f a, e .  A  totai  of  four  SPSS  procedures  are  interfaced  with  the 
Statistical  Variable  Subsystem  of  the  CRDB:  ANCVA,  BREAKDOWN,  T-TEST.  and 
vROSSTAIS  (  '  i gu  re  b;.  Or  MS  allows  the  or  to  produce  statistical  analyses 
of  ORI-P  variables  using  SPSS  wit1' nit  requiring  him  to  be  familiar  witt 
form, i  "mg  SPSS  run  cards.  The  interface  program  provides  easy  to  follow 
instructions  foi  u* •  r  inputs. 

Tin  user  may  initiate  a  hatch  inn  which  will  automatically  v<  'rieve  t he 
t’RI'B  statistics  and  create  a  r  ms  t  re  am  of  SPSS  control  cards.  This  will 
re-, ult  in  in  SPSS  run  with  output.  The  user  has  th,  choice,  however,  of 
running  the  SI’S"'  aut  om«t  i  c  al  1  \  or  having  the  file  containing  the  runs  t  ream 
let  amed  for  additional  mod  t  icat  k.n. 


7.  i  on  men ,  s .  Ili>  vomments  subsys’em  provi.es  an  opportunity  t  <->  r  users 
and  developers  to  record  information  related  to  tin  ()KI)H  whi  1  using  a 
r>-Poti  terminal.  Ucnirrents  can  include  anything  relevant  to  data  contained 
in  the  svster.,  or  o  the  system  oper  it  ion  itseli.  ft  has  been  especially 
us,  tul  as  a  means  ot  obtaining  user  leedbaok  and  for  announcing  the- 
i  in;  1  <  men  t  a  t  l  on  ol  ennancenient  s  hi  chan.:,'-. 


v  XI  :  is  .  ;rr»rtl'.  lm,:  r\  r:r\  -I  tr»  napor  pr.’>\  ti>  at  AFHRL  <  e . p .  , 
'.rami:.  .i.i'1.,':  Sv.-t-'-s,  basic  SriIIs,  Attitude  K-ciirertrt^,  Periorranci 
jv.ri'trt,  am:  FI'  r.:  * o r k  rorce,  ap.v.  it  is  planned  tor  urn  uitn  Aavar^sc 
r-th-’-.L  c  Ir  : r r  a ' .  At  (.Me,  CKDb  is  used  to  provide  quick  in-derth  orien¬ 
tal  1  •'=  t  .'rA<  a"  well  as  to  rapid;,  respond  to  high  level  canacement 

r  r  i  r  •  'at--  a".’  o.ivp-rsed  sets  of  data  into  a  consolidated  data  bank 
w:  ic'r  .a:,  r-  rapid;-  accessed.  Instead  'f  t n e  ncrrral  laborious  and  tioe- 

rL.  -  ,sk  .  ;  tincine  background  in ! o it at  ion  by  formal  ri.au-.-sts  t* 

vo-pnter  .-<■  arcn  nr  Air  Force  recul  at  ior« ,  ana. or  digging  tnrocg.i 

n  litur.  't  :•  critical  reports  and  previous  studies,  the  (-RL’R  allows  the 
ju-r  to  str-arlire  data  r-t^uvcl  wlile  savins:  ceirputer  resources.  OkDB  is 
\a.,.-.rit  tot  aidir.r  res-.arcn  a*,  si  it.  for  conduct  me  historical  and  cross- 
c-  i.'ut\  ar.ai\se«,  end  tor  guarding  against  duplication  of  effort  and 

ln-o'-ist-nci’-s  r<  tween  catalases.  Cleariv,  (.'RLE  enhances  researcher  ana 

-ar.sce“>  rt  pr'un-  tivitv. 
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Fffeets  of  Absenteeism  on  the  Performance  of 
Air  National.  Guard  Personnel 


?;:v.  rt  P.  r,J  h.y  S.  Shan? 

A :  r  F  j r  c •?  Institute  of  T  e  c  h  n  e  1  o  g y 


•  >  . •,  s  r  tee  a  os  in;  .  I  1 1  .  r  it  , re  g i  »•  j  substance  to  claims 

-  >*  -.ell  'isoarei  on  ab  -sen  t  ?  3 1  c.t  has  o-~“on  performed 

;  ,  '►  _ :  t  :  ,  s  3ti  es,  1  -*  8  2  r  -1  ecu  c.  s  k  y ,  1  9  ?  7  /  .  This 

■.  ;s  -f  re  se  iron  a  is  oft'n  yielded  mixed  findings  and 

_  t .  n  '■  Ms'ts  '  .  j  .  ,  Muchi  a  sky,  11.7;,  although  sons  rial 

•  - ,:i  o.  -  *n  ~  j  ?  :  3  ■  a  lu  a  t  u  r.  g  traditional  common -s  a. is  3 

i  r-  .'ss  *  inyl.r,  12  8  5  1  .  For  example,  Scott 

i.'.rs  ■ t a- an  i ly s : s  his  brought  some  reliable  support  to 
•  i-s:  >■  ling  c  i :  t  a  of  res  war  veers  m  geo  satisfaction  - 


■  >‘c'  n  .  1.  ,  r  -=»  s  ?  a  r  c  ae  r  s  hart  regarded  absenteeism  as  a 
i.  s  f  it  ;t '  ::a  al  respons-  emitted  oy  an  employee  attempting  to 
Aimir^  ■’  r ;  r  an  unpleasant  wor<  experience.  However,  tnis  view 
;  ■  i  ■>  ►  s  i  "•  •'i  1  t  «  *-  i  ' ;  jn’  a  a  1  .  M  9  8  2  )  not1  that  both 
functional  and  dysfunctional  outcomes  may  be  associated  with 
■--ploy):  absent '  ais.m.  However,  casus  of  absence  abuse  are  still 
cc.nsai'rad  oy  :-''S‘:  theorists  to  be  primarily  dys f unct iona  1 
.  j  in-:. ral  patterns,  anl  tneir  effective  control  is  .assumed  to 
:  a _  ? :  j n  :  *  1  u a n *■  i  i  ;ri  u n  1  s  to  the  organisation. 

A.",  ci-itrol  policies  have  long  b'en  justified  on  the  grounds 

a  io--  -1:1  :  -  muse  contr  io.,te ->  to  labor  costs,  and  it  has  also 
’in-;*.]  ;onc  eptn  a  1  ly  to  lower  organisational  aid  individual 
ii  -i  ty.  While  evidence  of  the  labor  costs  associated  with 
>rsv.  ctv’'  continues  to  mount  ( ct  .  Mowday  et  al.,  1  98  2),  little 
sy  s  t 't  1 1 1  :  thought  has  iron  given  to  the  purported  absence  - 
o  -  r ' u. mane?  linkage  (Mowday  et  al.,  1982).  Tnis  relationship 
v\  iri  to  ha  ■>  been  taken  for  grant. >d. 


’  t,\i.i-s  '  iso1  o  3 1  l  i  F  s  ibout  absence  correlates  have  increasingly 
.  '-n  ju  'aliened  by  researchers  and  theorists  (  e  .  g  .  ,  Scott  & 
laglur,  1185).  Likewise,  the  issu.no  1  relationship  between 
ins  eat  '  '  i  an!  declining  productivity  should  be  subjected  to 
■empirical  review.  Although  occasional  investigations  (Keller, 

1  a 3 4 ;  IheciJan,  1  98  3)  nave  reported  incidental  correlations 
b  ’tween  absenteeism  and  gob  performance  (Treasures,  this 
relationship  has  not  been  systematically  studied.  The  present 
it  id/  rep.rrs  correlations  between  absenteeism  and  job 
o  ■  r f  or  ma iv  ■  ratings. 


7  j  r ,  led  L1  published  r-.suircn  has  been  performed  examining  sick 
1  mv  >  is.tgu  e  a  r  >  5  ,n  L  a  a  fed*. 'ral  sector.  This  is  perhaps 
lnfortunat  '  given  the  rather  lioeril  sick  leave  policies 
urr*'ntl  y  in  •xisteuce  in  the  f> -dural  government.  Leave 
1  imin1  ilr  tfion  for  Depart... .nt  of  the  Air  Force  civilian  personnel 
1  f-  d  “fill'd  in  Air  Force  Regal  it  ion  40-63  1  iUSAF,  197  11,  and  these 
,.oli'  .-i  u‘  '  n:iip;ri!'  eh  t  ong . .  o  n  t.  the  on  tin  federi)  service.  The 
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p-.-.  ?y  a r -s  1  >  aa-s  o  si  <  i'-iv;  p_-r  yen  r  r  ‘j-irni  as  o 
a.*Ti  :rity.  Ph.-r;  is  no  absolute  accumulation  limit  on  th  '  amount 
rf  sisk  lor.?  e.rploy-es  n  accrue,  and  tV'y  may  "bant"  accrued 
si..-;  l_>uv?  for  o:''.-‘r  e^r:  uses.  Accumulated  sick  leave  may  be 
•  1  t  '’ire  f'lrly  'cc  r'ducioj  service  time  annuity 

s  1 1  -  ;  a”  .  on?  ur  to  .n;r?’s  •  the  se'vice  time  used  in 
■  d.  ■  /.  r  ;  t  :  :  ci.  .  n  i  s  r.u  cu  as  n  :  •  r  ■  o  a  y  m  a  n  t  s .  In  effect.. 


u.  n.  v  r.  :  r  - 1 
-ns  ■  -  1 1  :  /  •'!; 


i  .  •  :  - .  s . 


n  .  •  >;  s-  ,  ]•  iir  st  d/  ’  3”.  ’  i  linko  between  absinca  us- 
•  5  - 1  ip.j  unir.i  mi:  v"il  p.-licy  (naltcn  &  Perry,  198.,  Larson 

,  :-i-:  i-i,  ' '?  ? ;  .  Pol  i  n  fere-..-  m  tn-'  federal  sector  will 
i.iit.ily  i  r  p  a  ;  t  th.  -  way  sic-c  le^v?  is  is^d,  and  f  u  r  th  > ,  more , 

.  t  ,-.i  irt/i-*-  tn?  w.ij  m  whi'i  ins'-i:-*  i'?lat?s  to  other 
:  r  j  a  n  i  /.  a  f  :  >.ul  j  if  comas,  note,  ole,  30b  per  forma  nee.  A  seconl  major 


r?]  V  n^s'u’o. 


Method 


Par  1  r  it  _-iu  strip  were  provided  oy  1  >4  full-time  employees  of  an 
\i:  h'a-ional  ~  1  a  t  i  base  in  the  west  rn  united  States.  The 
•  i.-d  r'epond'nt  was  iele  ( d  3  6 )  and  between  31  and  40  years 


A  h  s  '  n  t  'list  m  th-'  pres  ant  study  was  ay:  r  a  1 1  ona  1 1  zed  in  terms  of 
authoris'd  sict  1.mv*»  is, aye.  The  ory-ar  ication  provided  sick 
1  ;  1 ,  ?  Ji’-i  for  all  employe's  covering  the  period  January  1,  1983 
to  9  '.-mb'r  31,  1933.  Absent?  records  for  eoch  employee  were 

■'i  jin;’,  -i  inrc  2  A  b.  weekly  pay  periods. 

. .  ,os  1  i  ?  r  ao 1  e  d--bate  has  occurred  on  th  >  selection  of  an  absence 
-Ti  ?  t  ■■  1 1?  .  Most  a-r  nor  t  r.-qarl  frequ-'ncy  iniises  -as 

:h<  netrically  sup.-rior  to  time  1  o  A  insures  ( Chadw  ick-Jones , 
tir  1  *n,  Miciolson,  ^  Sh-ppnri,  19~1;  Hammer  &  Landau,  1981; 

M  ic  a :  risky  ,  .9  7  7  1  .  A  f  r  -  j  i-'tr;  1  n  lex  -'xpi  ess's  absence  an  t-'rms 

of  t  n  mb -r  jf  in.-i<i>nts  of  ibsm--,  r^qurlless  of  thei" 

.  '1  3 1  1  Pi  ri 1  durations.  I  ?  contrast-  Lite  lost  indices  exor.'ss 

a  ns  1  n  t  >  r  ns  of  t  ‘1  ■  nmo  in'  >f  tim?  lost  (i.e.,  hours  or  days 

lost).  1 1  '->‘n  t  tie  lmya  :r  that  choice  of  index  may  have  on  study 
o.i*-  ,,  f  h-  pr-'.s -n*  i  n  v._>  ,  t;  1  <j  1 1.  i  on  employed  both  a  time  lost 

.  3  -X  in  3  i  ;  f  -  juen  ■■/  ;  ud->x  . 

is  t  t->  or  seat  d-ifa  w  ,-r  :>  collaps'd  mt->  kfi  units  of  information, 
r  a  1  s  eu  ni/,Ti'n  impos'd  1  1  m  1  i  i  *-  1  ->n  on  ih-velnt  ment  of  a 

fr.-jn'ncy  ;nL>x.  I'hjf'f  a.  ,,  f r > quency  r  ..presented  tn--  numb  >r  of 
pay  p'-ini,  in  wh’.’h.  -n  ans-ui  *'-'ir  occirred,  regardless  ot 
3i>  i  r  1  <n  ;bi>  r  1  -q  -  J  2*>  )  .  lust  w  m  •  a  1  cu  1  a  t-'d  as 

:  t- •  f  i*  i!  -i  -,r.u  -i  C  .loirs  o  1  d  1  r  :  u  )  t  h> ■  -'d ’  1  t  '  '•**  a  r  . 


-c  v'  \ 


A  N  _\ 


'V'-ZcCA 


.S  -  C- 


P  *  i  7  o  r  r  a  u  "  *  Aipiaisalu 

iw  >  so;r;'s  o:  --'Trio/-;'  po  r :  a  r  m  aa  ce  itjot  iis^ls  wore  utilized. 
;.'’i’-i';'rH5i1  >  wera  >  ro.  ided  w'.t'm  a  largar  sutv.y 
;!>■'!  ;r. '.;r  >.  Tv4  s  .•!•'  -upor  ti  5*1  T.thud  was  based  on  a 
t  1  •iU'M-j  ::  s-'lf-appr  aiaal  j  ]  by  Steel  and  Oval  l:  (  1984) 

i  •'•edback  Basel  ?  a  I  f  -  App  n  i  s  ■  1  .  This  method  is  comparable 
co  :  r  -i  1 1  c  ion  a  I  so  1  f -uppr  a  i  s  a  1  s  l  n.  most  respects  except  that 

lo,  j’  .  i'"'  ms*.  ,oc  4d  to  has?  their  ratings  on  feedback  they 

ii’.  -eived  f  r  ,m  their  immediate  supervisors.  Steel  and  Ova  lie 
1  M-l  i  .'em)  Feedback  Das  d  9 e 1 f - Appr a i s a  Is  to  be  more  highly 
4 1  it'd  ‘o  supervisor/  evaluations  than  were  conventional  self- 
•  r.i  v.h.  "valuations  w.re  me  i  a  on  five  performance  dimensions: 
jj  mit/  pjality,  elfi’iency,  pt  obi  am-solvi  ng  capacity,  and 
i  i  a  ■ , -  a b  i  1  i  t y  .  S  :  v  4 n  point  a  jr  ?  >  -  disagree  rating  scales  w e  r  e 
mod. 


’»  r-e  r  : 
l.  irnn-'d  i 
1  1  r  >  >■'  >. 
;  r  a  "  p 
re  d 
•  mp ] oy 


isor/  appraisals  wo  r  -  solicited  from  each  employee1 
ate  supervisor  Thus.-  ratings  were  made  on  the  sain 
inns  as  ‘hr  s  4  1  c  -  r  a  1 1  ngs  .  In  addition,  an  overall 
loy4**  e  ‘  f  •'  at  i  v  >o  _.<-,s  was  obtained.  The  supervsory 
i  s  t-  r  i  ou  t.  »  ii  on  7-point  scale"-  ranging  from,  (1)  "this 
a  is  _far  worse  •  h.>n  the  typi"il  to  '!)  " 

■  ■  '  is  f»r  b-'  1 1  '  r  ‘n-ir:  *■  h  ■>  i-  "ucal  o  711  1  o/oe  .  " 
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l  or  nun  '<*  -  absence  frequent  co  r  rel-a  t 1  ons  rfir.1  all 
acasi  jm e  10  ant.  Hcvever,  correlations  between  the  time  lost 
ab$'i'-  5iM>ir.*  and  two  performance  dimensions,  efficiency  ana 
adaytaoility  were  statistically  significant  (p  <  .03).  Given  the 
g-*ncril  u  are  1  i  abi  1 1  ty  and  insensitivity  usually  attributed  to 
p-erf  u  romance  appraisal  measures  (e.g.,  Tchmidt,  Gast-Ro.senberg ,  & 
Hj-e'r,  1  980'  and  t!ae  unreliability  of  time  lost  absence  measures 
'  'hi  eh i nsky ,  1  977),  the  magnitude  of  these  significant 
tor  r  el  a  1 1  or.s  probably  underestimates  the  true  relationship 
between  ab-ence  and  performance.  These  data  may  reasonably  be 
tens  trued  a  indicating  that  excessive  sick  leave  usage  is  likely 
to  n tve  detrimental  effects  on  the  work  performance  of  federal 


D iscuss ion 

k I  though  it  first  blush  the  significant  correlations  appear  to  be 
small  anc,  potentially  trivial,  we  believe  they  probably 
mu ; r estimate  true  performance  -  absence  relationships.  We 
cell  eve  the  cards  wet  a  stacked  against  finding  significant 
c e r r a 1  a t ion s .  The  purported  reliability  of  both  variables  is 
low,  and  as  Hunter,  Schmidt,  and  Jacxson  (1982)  have  stridently 
pointed  out,  such  art- facts  as  er" of  measurement  contribute 
substantially  to  the  apparent  magnitude  of  bivariate 
relationships.  Furthermore,  we  believe  the  generous  sick  leave 
accumulation  policy  in  force  in  the  federal  sector  removes  the 
"use  ur  loose"  incentives  associated  with  most  sick  leave  plans. 
Fnis  policy  should  have  the  effect,  of  reducing  absence  criterion 
mce  because  less  total  sick  leave  will  be  taken.  Hence,  we 
be  1  lev..'  the  results  yive  genuine  support  to  the  long-standing 
oil i if  among  personnel  managers  that  absence  prone  employees  are 
also  poor  performers.  However,  the  correlational  nature  of  the 
data  prohibit  any  causal  inferences.  it  is  impossiole  to 
dat'rmine  whether  the  performance  of  sick  leave  abusers  is  rated 
low  b icaas >  they  are  absent  so  often  (i.e.,  performance  judgments 
are  influenced  by  general  negative  impressions)  or  Decause  their 
wer'i  iTiunce  j‘'num"ly  suffers  from  so  much  lost  work  time. 

lh  ;fuuy  yielded  i  couple  of  surprises.  Many  researchers  have 
observe)  that  frequency  absence  metrics  are  more  reliable  than 
tir>*  li  st  absence  scales.  Th*'  degree  oF  reliability  of  an 
absence  tvm.su  ro  should  have  a  profound  -effect  on  the  ability  of 
rnat  metric  to  correlate  with  oth-'r  variables.  This  fact  was 
-leariy  manifest  *n  Scott  and  Taylor's  (1985)  research.  However, 
t  h<‘  pros  eat  study  failed  to  obtain  significant  correlations 
b*tw--en  per f  urm-ane-e  an  1  absence  frequency,  although  si  gni  f  icarit: 
f  m  wer>  obtain'd,  f  >r  th“  normally  inferior  time  lost  index.  We 
.lusp"  ’t  'hat-  c.l  lapsing  the  data  into  26  pay  periods  reduced  the 
total  amount  of  variance  obtainable  with  the  frequency  metric, 
and  perhaps  this  constraint  on  the  potential  viriinee  produced 
-  o  •  oom  t. -suits  with  this  absence  inle-<. 
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IN  SEARCH  OF  HONESTY 


Alien  N.  Shub,  Ph.D. 

Department  of  Management 

Northeastern  Illinois  University,  Chicago,  Illinois 


In  the  4th  Century  B.C.,  legend  has  it  that  a  philosopner  named  Diogenes 
wandered  about  the  Athenian  countrysiJe  in  broad  day  1 i ght  with  a  lighted 
lantern  clutched  in  his  outstretched  hand.  He  was  searching  tor  an  honest  man. 

Now.  more  fhan  ?  ?C3  years  later,  the  search  continues  as  governmental 
agencies  and  private  industry  look  for  honest  employees.  In  most  employment 
settings  today  the  demand  for  help  is  great  and  often  continuous,  especially 
wf’re  lower  wages  have  effected  maximum  turnover  and  lower  reenlistment,  where 
responsibilities  are  great  and  temptations  are  even  greater,  and  where  an 
indifferent  society  has  spawned  an  atmosphere  more  prone  to  dishonesty  and 
violence  than  its  work-ethic-oriented  predecessors. 

Is  u  that  hard  to  find  an  honest  and  stab’e  employee? 

Consider  these  facts.  According  to  the  US.  Department  of  Commerce,  at 
"east  30S  of  all  business  failures  each  year  are  due  to  employee  theft  and 
other  forms  of  employee  crimes.  A  atudy  conducted  in  1983  concluded  that 
approximately  one -third  of  all  employees  steal  from  their  employers. 

What  c?n  be  done  to  combat  these  losses?  To  answer  this  question,  it  is 
helpful  to  u>  e'erstand  theft  in  te 'ms  of  the  profile  of  a  dishonest  person  and 
the  theft  triangle. 

Profile  of  the  Dishonest  Individual 

Research  by  Dr.  William  Terris  and  his  associates  at  London  House,  Inc.,  a 
Ch  icago-ba^eJ  psychological  testing  firm,  has  shown  that  employee  thieves 
perceive  the,  solves  as  average  persons  m  a  basically  dishonest  world.  In 
cult rusi,  honest  and  stable  persons  perceive  themselves  as  above-average 
p-j-*sv.ns  in  a  ba.ically  honest  world  The  employee  thief  accepts 
rationalizations  of  theft  behaviors  and  is  to1ora>'t  of  and  less  punitive  toward 
theft.  If  caught,  the  employee  thief  responds: 

"I  did  1 t  because. . . 

my  superv i sor  i s  doing  i t . 
i  needed  it  more  than  they  did. 

the  company  pavs  me  so  little  for  what  they  got  from  me. 

the  company  took  advantage  of  me. 

it’s  covered  by  insurant e.  No  one  loses." 

in  addition,  ferns's  res.  <rch  has  shown  that  most  employee  thieves 
: °ee» vieweo  do  noi  acknowledge  tneir  acts  of  theft  as  crimes;  a  criminal  is 
.one one  who  steals  more  than  they  do. 


Researchers  hare  noted  that  three  elements  must  be  present  for  an  employee 
to  steal:  opportunity,  need,  and  attitude. 

Opportunity.  Most  employees  have  daily  opportunities  to  steal  without 
fear  of  being  caught  or  punished.  In  many  situations  neither  apprehension  and 
punishment  nor  security  devices  completely  deter  employee  theft.  in  fact,  in 
some  situations,  security  systems  provide  the  motivated  thief  with  the 
challenge  to  he  even  more  creative. 

Need.  Everyone  has  needs  and  wants.  Faced  with  increasing  consumer 
prices  (rent,  unexpectedly  high  medical  bills,  or  other  expenses),  honest 
employees  will  either  work  harder  at  their  jobs  in  hopes  of  getting  a  raise  or 
promotion  or  they  '..ill  take  on  an  extra  job.  the  dishonest  employee,  however, 
will  turn  to  theft,  wherever  it  is  easiest.  Often,  it  is  easiest  where  he  or 

2  h  n  u  -  n  r  Lx  c 

Attjtude  Attitudes  determine  how  needs  are  satisfied.  Honest  employees 
will  resist  the  temptation  of  theft  because  of  strong  moral  attitudes. 

It  would  appear  that  if  employers  could  somehow  assess  attitudes  toward 
theft,  they  would  be  able  to  screen  out  individuals  prone  to  theft.  In  the 
past,  however,  employers  have  used  the  Diogenes  tec.hnique--the  lighted  lantern 
in  broad  day 1 ight- -and  have  gotten  about  the  same  results.  Let’s  examine  some 
of  these  traditional  hiring  techniques  in  more  detail. 

Selection  Techniques 

Good  business  practice  dictates  that  whatever  selection  procedure  is  used 
m  the  hiring  situation,  it  should  be  validated;  that  is,  it  should  be 
job  relevant  and  predictive  of  future  job  performance. 

Selection  Interviews.  The  interview  is  widely  used  in  the  selection 
process.  It  is  hard  to  imagine  hiring  or  recruiting  an  individual  without 
having  a  face-to-face  discussion  with  the  applicant. 

The  typical  interviewer  relies  on  pr-sonal,  subjective  values,  opinions, 

and  intuitions  when  making  hiring  or  recruiting  decisions.  Obviously,  there 

are  many  pitfalls  to  this  approach .  Winning  the  confidence  of  an  interviewer 
tar.d  possibly  winning  the  position)  can  depend  upon  whether  a  person  i. 

naw>. ver.  dues  not  reliably  indicate  that  a  person  will  have  the  necessary 
•kills  and  nilities  to  successfully  perform  a  job.  And  it  certainly  does  not 
reliably  indicate  whether  a  person  is  likely  to  steal  on  the  job.  In  fact, 
data  on  the  validity  of  the  interview  as  a  selection  dcvic°  has  been  poor, 
let,  the  interview  is  often  the  single  most  used  method  of  selection. 

Application  Blanks.  Nearly  ail  employers  require  the  completion  of 

application  blanks  While  rcw  ponse .  to  questions  on  an  application  may  be 


so--;.-.hat  distorted  or  exaggerate!,  falsification  is  minr’izcd  to  the  extent 
teat  an  applicant  believes  that  the  responses  are  verifiable.  Rarely  will  an 
'■ti.ipjal  admit  to  leaving  a  pasf  position  because  of  theft  (although  one 
v-'eatv.e  individual,  later  discovered  to  be  a  thief,  gave  as  a  reason  for 
’caving  his  last  j;b:  "They  couldn't  afford  to  keep  "o'). 

Pesu-'-s.  The  purpose  of  a  good  resume  is  to  sell  the  applicant  to  the 

recruiter.  To  that  end.  it  is  neither  in  the  applicant’s  Dost  interest,  nor 

reasonable  within  the  format  of  the  typical  resume,  to  include  an  admission  of 
theft.  The  trend  today  is  for  "formula"  resumes,  especially  with  the 

proliferation  of  employee  outplacement,  counseling,  and  marketing 
organizations.  Certainly,  resumes  will  continue  to  be  used  by  recruiters,  but 
the  resume  cannot  be  expected  to  be  an  adequate  indicator  of  future  job  success 
or  of  future  erployee  theft. 

Reference  and  Background  Checking.  Reference  checking  generally  consists 
of  verification  of  previous  employment,  education,  personal  and  business 
references,  and  any  other  information  supplied  by  the  applicant  that  is 
considered  to  be  job-relevant  and  deemed  important  enough  by  the  recruiter  to 
investigate.  Wnen  the  situation  warrants,  credit  and  criminal  background 

checks  are  also  made.  The  name  of  a  reference  is  provided  by  an  applicant 
with  the  expectation  that  the  reference  will  be  favorable.  Even  when  names  of 
references  are  not  specifically  provided  by  an  applicant,  as  in  some 
background  investigations,  most  references  tend  to  be  positive.  Previous 
employers  are  fearful  of  liability  in  revealing  employee  poor  performance  or 
dishonesty  and  are,  therefore,  very  reluctant  to  supply  negative  information. 
Moreover,  since  some  20%  of  employee  theft  qoes  undetected,  previous  employers 
may  not  even  be  aware  of  a  former  employee's  crimes.  Reference  checks  are 
necessary,  but  they  simply  are  not  enough. 

Mother  pitfall  of  reference  checks  and  background  investigations  is  that 
they  can  be  worthless  when  done  superficially  by  untrainea  investigators.  And 
many  personnel  assistants  or  governmental  clerks  tend  to  be  just  that. 

Polygraph  Examinations.  Diogenes  was  not  the  only  person  in  ancient  times 
searching  for  the  honest  individual.  Goth  the  ancient  Chinese  and  the  Arabian 
Bedouins,  pioneers  in  the  development  of  lie  detector  rests,  believed  that  the 
dishonest  person’s  month  became  dry  while  engaging  :r<  deception.  To  test  for 
dry  mouth,  the  Chinese  required  that  the  person  chew  and  spit  out  rice  powder 
and  the  Bedouins  .  equirtd  that  th>.  person  lick  a  hot  iron.  Dry  expectorated 
Ponder  and  purred  tongues  were  considered  as  evidence  of  lying.  While  Diogenes 

presumably  some  of  the 

'.n  i  r.uit*  dtij  bedouins  passed  this  crucial  test. 

A  distinction  must  be  made  between  two  types  of  dishcnesty--lying  and 
stealing.  The  polygraph  was  designed  to  assess  untruthfulness.  It  does  so  by 
monitoring  the  candidate’s  physiological  reactions  (pulse  rate,  blood  pressure, 
respiration,  and  galvanic  skin  response)  to  direct  questions.  The  polygraph  is 
especially  useful  in  facilitating  the  orocess  of  obtaining  admissions,  often 
rv  n  Before  tb/'  actual  po1} graph  examination  has  begun. 
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an j  traiiiing  vary  greatly  across  Mie  country, 
or  this  screening  method  highly  variable.  While 
smog  out  those  candidates  with  previous  criminal 
the  polygraph  can  screen  cut  those  who, 


rignt  circumstances  and  opporti.ni ty ,  are  prune  to  stealing. 


Paper-and-fVr.ei  1  Tests.  Until  comparatively  recently,  paper-ard-penci  1 
te^ts  f:r  assessing  proneness  to  theft  have  been  virtually  nonexistent. 
Originally  developed  as  a  substitute  or  alternative  to  the  polygraph, 
oarer- and - oenc 1 1  honesty  tests  have  become  entities  in  their  own  right. 


To  validly  product  proneness  to  theft,  a  good  honesty  test  should  assess 
beliefs  about  the  extent  of  theft  in  society,  positive  attitudes  toward  theft, 
ruminations  about  theft,  perceived  ease  of  theft,  interthief  loyalty, 
likelihood  of  detection,  knowledge  of  employee  tneft.  punitiveness, 
rationalizations  about  theft,  assessments  of  own  honesty,  and  theft  admissions. 


In  addition,  the  test  should  contain  a  distortion  scale,  to  detect 
attempts  to  "fake  good"  on  the  test. 

All  major  honesty  tests  available  toddy  attempt,  with  varying  degrees  of 
success,  to  meet  the  above  requirements  in  measuring  attitudes  toward  theft. 
Only  two  tests,  however,  appear  to  be  backed  up  by  major,  published  research: 
the  London  House  Personnel  Selection  Inventory  and  the  Reid  Report.  There  are 
several  other  tests  for  which  very  little  technical  data  are  available, 
including  the  Stanton  Survey,  the  TA  Survey,  and  the  Wilkcrson  Pre-Fmpl oyment 
Audit  . 


tendon  House  Personnel  Selection  Inventory.  The  Personnel  Selection 
Inventory  (PSI),  developed  by  psychologist  William  ferns,  was  first  published 
in  1975.  There  are  several  forms  of  the  PSI  available  to  measure  attitudes 
toward  and  opin’ons  about  dishonesty  and  which  include  checklists  for  dollar 
values  of  merchandise,  property,  and  money  stolen  (the  admissions  section). 
The  only  test  of  its  type  to  do  so,  the  nSI  nas  an  independent  distortion 
stale.  Other  PSI  forms  available  include  additional  scales  for  d'ug  abuse, 
violcr.ce.  emotional  instability,  and  safety  locus  of  control  (acudent 
prevent i on ) 


Petr!  Report.  The  Reid  Report  was  first  published  in  1950  by  John  E.  Reid, 
a  pedygrapher .  The  Reid  Report  consrts  of  two  parts-  yes/no  questions 
designed  to  assess  the  candidate’s  attitudes  about  I onesty  and  theft  and  a  set 
uf  application  blank-type  it*-m.->,  followed  b/  it  mis  designed  to  obtain 
a  i.  lesions  of  t  tie  ft  and  utter  'runes. 


■is"-r  o*  th~  >>•,.)  Van.  r  finest,  tost; 


are  ~'ar'v  ways  to  compare  rests  that  purport  to  .Teas. ire  the  same 
iv'  r  oS>,.s  for  c'";a-'isor  .oclude; 


5as>-  of  Ad~in  iti  at  ion.  both  the  PSI  and  the  Reid  Report  are  rather 
tar],  aJ~ ir’stered.  The  directions  are  clear  and  easy  to  follow.  For  both 
i*  is  necessary  that  the  test  be  given  on  company  premises  under 
v  ’ ‘-'Pei  tfSf,r,g  situations.  The  tests  should  never  be  given  or  mailed  to 
a;;li;an:s  for  them  1.0  take  on  their  own. 

Ease  of  Scaring.  The  scoring  of  both  tests  is  controlled  closely  by  the 
r.__;  -‘ct  ivo  publishers.  Th.r  primary  reason  for  this  control  is  test  security; 
t'*_-  p„bli„her:  thus  ensure  piotection  of  the  scoring  keys  and  the  norms. 

L  tn  ;  .blish-rs  pro.id**  a  mail-in  service  whereby  the  client  mails  m  U.,t 
;  Ni*fs  a"J  receives  a  r* po»t .  generally  mailed  within  24  hours  of  receipt, 
o.’i  i  th->  -ails  has  ob/iou*  disadvantages,  of  course,  for  clients  needing  more 
■'  -  iiate  results:  here.  e.  both  publishers  offer  a  telephone  method  of  scoring. 
In  this  approach,  information  from  the  test  is  clerically  compiled  by  the  test 
ai  .mistrafor  -and  then  phoned  in  to  the  publisher’s  test  center;  the  test  data 
a-'C  entered  on-line,  scored,  and  reported  back  to  the  client  within  seconds, 
for  telephone  scoring,  tne  PSI  has  the  advantage  of  a  simpler  preliminary 
ruling  than  the  Reid  Report,  whose  preliminary  work  is  more  complicated  and 
requires  more  manipulation. 

bn  cite  computer  scoring  is  another  scoring  ootion.  The  PSI  has  a 
-■f,wire  scoring  program  that  can  be  run  on  certain  personal  and  mainframe 
coputcrs,  thus  providing  for  immediate  results. 

Ease  of  Interpretation.  Both  tests  provide  a  measure  of  dishonesty,  with 
a-iictance  from  I  he  publishers  in  determining  recommendcd/not  recommended 
■  *  in  4  a rd ,  for  the  hiring  of  candidates.  Tke  PSI,  in  addition,  provides  an 
, n ,  p.'ivkut  distortioi  scale  that  indicates  the  extent  to  which  the  candidate 
is  being  truthful  about  the  answers  given 

Roupurve  format.  The  Reid  Report  ha.,  mostly  ves/no  questions,  whereas  the 
PSI  alfi'Wc  for  five  to  seven  or  niore  choices  foe  each  test  question,  thus 
one u.iruq mg  pro  dimissions  of  dishonesty,  while  ‘"educing  applicant  stress. 

t- guilty.  Both  publishers  show  evidence  that  their  tesfs  arc- 
nn.id  i  sc  r  r  mat  nr .  ami  eet  various  guidelines  on  employee  selection  procedures. 
In  t'!i‘i(.ri,  tne  publishers  of  the  PSI  and  the  Reid  Report  stress  in  their 
1  i 1  *  >  c.i  nr'o  Ho.  ir  w ,  1 1  ir.yi'ss  to  assist  their  clients  m  any  litigation  that 
•a,  bn  ■■  na  ayji.iM  them 

P“sn-i»-r;i  Pv,.-*.  Both  the  PS!  and  i!v‘  Re .d  Report  have  published  research 
r>  jaii*  g  reliabilitv,  validi'y,  fairness,  and  other  technic  il  information. 
h'*id  si;  plies  eight  pub  I  ><  af  ions ;  I  andon  douse  provides  50  published  studies  on 
the  1 '  S I . 
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rC".  k-w  by  t-jrci.  Beth  tests  hive  been  reviewed  in  Buros's  Mental 
M-asS.ie  'its  »cireoo«.  Evaluat; on  of  The  Reid  Report  appears  in  the  eighth 
t.'-f'en  (  137  3).  tie  review  of  tt  ■  DSI  is  ,r  press  for  the  ninth  edition  and  is 
Cun  nMy  a.-aiiafe  in  retr  eve1  format  fam  tie  publishers  of  Buros.  The 
•••'a:  r  is  mated  o  consult  f  ?  Bures  reviews  for  additional  information. 

Or")  ud  i ng  Re  mat  ks 

Paper- aO-penc  1 1  honesty  testing  has  been  .omewhat  of  a  controversi al 
ate?..  Psychologists,  especially  those  in  the  testing  pr'Ossion,  have  shown  a 
healthy  skepticism  about  wnethe-  i  paper-and-penc i 1  test  can  really  predict 
prenenes'  toward  ea'cyee  theft  The  major  publishers  of  honesty  tests, 
especially  Londo  '■  Noire  and  Reid,  have  countered  with  a  number  of 
well  dearer. :ed  studies  that  show  correlations  with  polygraph  results.  For 
♦hese  who  am  uncomfortable  with  polygraph  examinations  as  the  criterion, 

Lenten  muse  has  also  conducted  a  series  of  RSI  studies  with  other  criteria  and 
" et nc dole  ay- - mcl ud  nq  tire  series  studies  (where  retail  store  shrinkage  was 
adored  following  the  introduction  of  the  PS  I  for  employee  screening; , 
predictive  studies  (.here  Pm  scores  were  related  to  future  detected  thefts), 
and  contrasted  groins  (where  U  was  fo;nc  lhat  charity  collectors  who  were  more 
honest  as  ^  easurea  by  the  ISI  turned  in  mere  money  per  day  then  those  who 

.coreu  as  'ess  honest).  For  those  who  survey  the  research  literature,  it  is 
hard  not  to  be  convinced  of  the  validity  and  viability  of  a  paper-and-penci  1 
approach  to  screening  for  honesty. 

Recently,  the  field  of  honesty  testing  was  given  even  greater  legitimacy 
:  -  a  1934  review  by  Tackett  %.  Harris  n  the  prestigious  Personnel  Psychology. 
Employers  have  always  known  that  paper-and-penci 1  ability  tests  can  be  highly 
predictive  of  future  job  success.  Now  the  personnel  and  psychological 
communities  ha-'e  acognized  that  paper-and-penci  1  honesty  tests  can  indeed 
predict  future  employee  theft. 

And,  oti  yes,  remember  Diogenes?  too  bad  that  honesty  tests  weren’t 
available  in  his  day.  If  so.  he  woulu  not  have  had  the  opportunity  t,o 
i" mortal  ire  himself  as  the  seeker  of  honest  men.  You  see,  Diogenes  and  his 

father  were  earlier  exiled  from  their  native  Sinope,  reportedly  for  tampering 

with  the  county’s  curr-nc.! 


»■ 


V- 


r  - 

y. 

c’ 


A  MODEL  CF  PSYCHOLOGICAL  STRESS  AND  CONTROL 


M.Kastner  and  W.VePf 

of  the  Armed  Forres , 

;  rir  Lconomics  and  braanizatiofia  1  Science 
vunich,  federal  Republic  of  Gerr  any 


Summary 

A  malel  cor  psychologies 1  stress  which  integrates  the 
approaches  of  McGRATH  (1976)  and  r AZARUS  and  LA UN  I ER  (1981) 

.  s  present*?  1 .  Tn  the  context  of  th<  ■  above-mentioned  model,  a 
f  :r ther  facet  of  control  ( emot iona 1 ^rationa 1 )  is  added  to 

*  .»  ^  r»  n>  »  ^  V>  )■>  zi » f  m  +■  **i  r%x.r  V>o  ra  »o  nnnnra  1  1  *  7  r>r~,r_,or-a-f'  ■~i 

w.vx'Os-.  "ux  -  »  ‘ *  '  —  /  *»'-•'  **  /  ^  j  ^  VV  v  s,  « 

' unstable 'stable ;  external ''internal ,  and  global/specific); 
some  empirical  proof  has  been  found. 

Consequences  for  diagnosis  (esp.  the  construction  of 
-i  i^st  i  anna  ires)  and  therapeutic  measures  by  means  of  v.hich 
optimal  stresslevels  can  be  reached  are  drawn  from  the 
’i.ikim  of  stress  and  rat  iona) ''emotional  control . 


1.  A  simplified  version  of  the  model 

"The  mo  lei  consists  of  four  components:  one  based  on  a  "Hand- 
lungs"- theory  (i.e.  a  theory  of  activity),  a  transactional 
one,  a  part  based  on  facet  theory,  and  tne  aspect  of  simulta¬ 
neous  registration  of  physical  and.  psychological  parameters. 
Accor  ling  to  the  " Hand ) unqs theoreti cal  aspect,  the  unit  of 
observation  must  be  an  action.  Such  an  action  can  only  be 
lescribed  as  '.-he  dynamic  reciprocal  action  between  human  and 
situational  components.  This  introduces  the  second,  the 
transactional  aspect.  Reciprocal  actions  should  not  be  des¬ 
cribed  ideographical ly  in  their  dazzling  uniqueness,  but 
these  too  should  be  classified,  in  order  to  make  diagnosis 
and  measures  (including  the  training  of  actions)  possible, 
facet,  theory  is  brought  into  play  .as  a  method  of  classifying 
situations  on  the  one  hand  and  persons  on  the  other.  Objec¬ 
tive,  physically  definable  parameters  and  variables  of  their 
psychological  subjective  representation  are  needed  for  these 
situations.  For  persons  we  need  objective  psychological  and 
physiological  (test-)  variables  as  well  as  variables  which 
subjectively  represent  a  person's  own  psychological  and 
phvsiol on  ion  1  state.  We  have  herewith  arrived  at  the  fourth 
ispect :  how  can  biological-biochemical  and  psychological 
par  v’e*-  be  linked  with  each  other?  The  goal  of  diagnosis 
and  of  thorap/  is  ilways  to  attain  an  optimal  level  of 


stress 
wi  thin 


stress 


stress,  i.  e.,  the  oscillation  of  actions  involving  stress 
within  ar.  ir.traindiviiually  varying  range  of  stress. 

Later  on,  the  contribution  of  control  to  the  description  cf 
ar.i  to  predictions  concerning  the  above-mentioned  process 
shall  be  discussed. 


First  the  model  of  an  individual  acti:r. : 

If  the  discreper.cy  between  subjective  situational  demands  and 
an  individual's  own  coping  potential  becomes  too  great,  the 
Individual  no  longer  tries  to  meet  the  demand,  but  instead 
exits  from  the  field,  rationalises,  etc.  Reflex  actions  eli¬ 
minate  the  right  side  cf  the  evaluation  process.  Felxible 
actions,  in  the  se''se  of  an  optimal  level  of  stress,  consist 
in  an  ortimal  combination  of  automatic  "small  circles"  and 
"large  circles"  on  higher  levels  of  action  within  the 
checking  and  planning  process. 

Too  little  stress  leads  to  insufficient  experience  with  these 
corrnnations .  Too  much  stress  leads  to  ineffective,  uncoordi¬ 
nated  combinations,  in  w.oich  normal!  v  automated  actions  rise 
m  hierarchy’  and  therbv  use  valuable  coonitive  caoacitv. 


Figure  1:  Circular  process  of  an  action  performed  under 

stress  (under  +-he  interrupted  line)  and  possible 
ways  of  terminating  this  activity  (above  the 
interruoted  line' 
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2.  Facets  of  control 


Prw  cm  s  :ch  an  'iCion  per  feme  1  md^r  stress  1 see  figure  1) 
he  ;  r.  r  1  ience  i  by  processes  of  control;  I.et  us  first,  in  the 
‘  r  m  i  t ;  o  n  hande  1  iowr  to  us  by  POTTER  (since  1Q66),  take  a 
Imk  nr  external  ani  internal  control.  An  externally  cor.- 
*■  r  ">l  l person  might  nossibl''  focus  none  cm  the  demands  of 
-he  situation  than  on.  his  cw;  potential  far  coping  with  it. 
(The  ocposit0  can  be  sail  for  an  internally  controlled  per- 
snr  I  ~he  comparison  between  situational  iemands  and  a  per- 
5  in ' s  awn  r  Jt^ntial  for  coping  with  the  situation  will  be 
strongly  dependant  on  external  factors.  This  may  have  oonse- 
i  lences  for  the  choice  o*  alternatives  for  action.  An  exter¬ 
nal  Iv  control  led  person  will  tend  to  create  an  alternative 
for  action  which  he  subjectively  believes  exists.  He  will 
-met  likely  "construct"  his  environment  to  suit  his  own 

The  circular  model  of  stress  can  be  similarly  played 
•  crouch  for  the  various  facets  of  control  which  have,  up  to 
:rw,  oe.nerii'.y  been  acceit^d. 

Especially  when  the  stress-model  was  applied  to  depressive 
pjivavior  ani  experience  ( KAS^MFR ,  1 9B4 )  ,  it  became  clear  that 
‘•he  "generally  accepted"  facets  of  control  were  not  adequate. 

W  pp  V  cone  arta  i  r.  ano  nria  i  p  m^rl  D  a  f-  r*  "\  d  A  Y'  f-  Vi  ,T» 

..  tG.acx  -  hum  -  -  O  -  JV4  *■  '  *  *  *  ^  - . ~  . -  '  ^  ^  w 

depressive  persons  obviously  suffered  from  a  "control  split", 
!.  e.,  the  rational  elements  of  control  did  not  correspond  to 
the  emotional  elements.  This  led  to  the  splitting  up  of  con- 
rol  i-bo  rational  and  emotional  control. 

Hat  i '>na  1 1  nq  control  can  here  be  defined  as  a  calculated  com- 
,>  o'  ’ «.('  i  which  is  "void  of  emotion",  between  the  environment 
an  1  oneself,  i  .  e.  the  comparison  between  the  stress  inherent 
*.o  a  situation  ana  a  person’s  own  potential  for  coning  with 
the  situation  as  well  as  the  appropriate  corresponding  causal 
‘ t ri but: ons .  Emotional  control  can  be  defined  as  the  feeling 
a f  being  in  control,  of  nulling  the  strings  oneself,  or  of 
i  ;inq  corn  railed  and  having  mo  strings  culled  by  someone 
else,  i.  ^ . ,  of  helplessness. 

. f  we  'onsider  only  high  a!  low  degrees  of  both  facets,  then 
the  following  systematic  combination  results: 
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Thus  four  groups  of  persons  are  formed,  those  who  are  both 
rationally  and  emotionally  controlled  (k),  those  who  are 
rationally  controlled  but  emotionally  uncontrolled  ( B) ,  those 
W"o  are  rationally  uncontrolled  but  e notional ly  controlled 
(  "'  and  finally,  persons  who  are  both  rationally  and  emotio¬ 
nally  ur.contr*'!  led  (ol. 


3.  Empirical  proof  to  data 

St.;;  lies  on  stress  a  mo  no  motorists  and  persons  in  a  state  of 

depression  have  shown  that 

-  both  facets  of  control  neoninqfully  describe  definable 
qualities  and  the  existence  of  the  four  types  refered  to 
a'-'ove  can  be  proved; 

-  stress  an  i  performance  d’ffer  ror  each  of  the  four  dif¬ 
ferent  "types  of  control",  in  accordance  with  the  type  and 
difficulty  of  tne  qiven  tasks; 

-  the  best  performance  for  simple  speed  tasks  was  given  by 
those  who  were  rationally  controlled  but  emotionally  uncon¬ 
trol  led  ( B) ; 

-  the  best  performance  for  complex  (intelligence)  tests  was 
gi-'-'n  by  those  who  were  both  rationally  and  emotionally 
cont rol led . 

-  The  persons  in  qroun  C  accomplished  somewhat  more  in  both 
types  of  tasks  than  the  persons  in  group  E . 

-  Pei  sons  with  a  hiqh  decree  of  rational  and  emotional  con¬ 
trol  experience  their  actions  in  stressful  traffic  situati¬ 
ons  as  being  more  strongly  automated  than  do  uncontrolled 
persons . 

-  in  addition,  rationally  and  emotionally  controlled  persons 
perceive  stress  situations  as  being  less  optically  complex. 
Thus,  simplifying  the  situation  and  meaningfully  integrat¬ 
ing  it  into  the  context  of  ones  actions  seems  to  be  an  im¬ 
portant  function  of  control. 

-  With  respect  to  physioloni cal  variables,  persons  who  are 
emotionally  controlled  but  rationally  uncontrolled  are  more 
oxo i table  than  test  persons  who  are  both  rationally  and 
emotionally  uncontrol  led  (see  KASTMER  &  GIJTLLOT,  19831. 
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4.  Consequences  for  future  diagnoses  and  measures 

In  all  ->f  oar  empirical  studies,  .:onuol  as  a  construct 
applicable  t.o  persons  and  controllability  as  an  attribute  of 
:  a  1 1  nation,  proved  t^  be  t  he  most  orelictable  variable.  The 
psychological  variables  we  usei,  taken  as  a  whole,  described 
if  ions  in  ler  stress  better  than  di  1  the  physiological  vari¬ 
ables  (see  K-YfnTR,  1980;  KAS^MFIR  ^  GUTLI.OT,  1981,  1983).  In 

aff.pts  to  link  different  variables  in  various  manners,  it 
was  shown : 

-  tha*  ,  first  of  all,  multiplicative  linkage  of  the  variable 
intensity,  duration  and  controllability  of  the  action  per¬ 
formed  under  stress  prove.!  more  useful  fchan  additive 

1 i nkaoe , 

-  that,  secondly,  control  was  always  the  determining  variabl 
when  it  was  negatively  attributed. 

-  When  conrrti  was  positively  attributed,  on  the  other  hand, 
intensity  and,  above  all,  duration,  became  more  important. 

Thus,  when  describing  and  predicting  stress  factors,  it  will 
be  necessary  to  try  to  systematically  include  all  facets  of 
control .  First  of  all,  controllability  must  be  regarded  as 
peculiar  to  situations.  This  contritut.es  to  the  classifica¬ 
tion  of  the  situation.  Secondly,  but  of  no  less  importance, 
control  must  be  regarded  as  peculiar  to  persons.  In  this 
case,  the  process  in  which  stress  is  dealt  with  will  differ, 
according  'o  whether  a  person  is  externally  or  internally  di 
reefed,,  instable  or  stable,  globally  or  specifically  orien¬ 
tal,  emotionally  or  rationally  control  lea,  or  according  to 
how  these  various  possibilities  are  combined. 

The  diagnosis  can  be  made  using  a  questionnaire  which  is  con¬ 
structed  on  the  basis  of  facet  theory,  and  which  systemati¬ 
cally  varies  and  combines  these  facets  of  control.  The  thera¬ 
peutic  consequences ,  on  the  ocher  hand,  should  take  this  dif¬ 
ferentiation  of  control  into  consideration.  A  person  who  is 
rationally  controlled  but  emotionally  uncontrolled  must  be 
diff'o'^ntl/  treated,  .aught,  trained  etc.,  in  consideration 
of  his  emotional  processes,  than  a  person  who  is  emotionally 
controlled  but  rationally  uncontrolled. 
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Methodological  Considerations  in  CBI  Research 
Theodore  K.  ShLechter  and  John  A.  Boldovici 
US  Army  Research  Institute,  Fort  Knox,  KY 

The  purpose  of  this  paper  is  to  examine  some  methodological  considera¬ 
tions  m  CBI  (computer-based  instruction)  research  as  applied  to  military 
training.  We  hope  that  this  discussion  will  help  military  decision  makers  in 
assessing  the  usefulness  and  limitations  of  CBI.  We  hope  also  that  the  is¬ 
sues  discussed  here  will  help  behavioral  scientists  and  others  who  provide 
technical  advisory  service  to  the  military  in  assigning  priorities  to,  and 
evaluating  the  outcomes  of  CBI  research. 

Tne  Armed  Services  are  planning  to  spend  millions  in  the  next  few  years 
to  expand  the  use  of  CBI.  The  US  Army  Armor  School,  for  example,  is  planning 
to  spend  over  three  million  dollars  in  the  next  three  years  to  develoo  350 
hours  of  courseware  for  various  aspects  of  armor  training.  The  plans  include 
training  various  levels  of  people  to  perform  tasks  ranging  from  using  volt 
meters  to  land  navigation  and  tactical  decision-making. 

A  major  source  of  support  for  using  CBI  in  military  training  is  the  re¬ 
sults  of  research  studies  which  suggest  advantages  of  CBI  over  conventional 
instruction.  Several  reviews  or  literature  on  CBI,  Kemner-P ici  ^rdson ,  Lamos, 
and  (Jest  0984),  and  Orlansky  (1985),  for  example,  have  indicated  that  the 
use  of  CBI  led  to  reduced  training  time  and  concomitant  reductions  in  prj.ee, 
with  little  or  no  decrease  in  effectiveness.  Those  reviews  also  suggest  that 
students  favor  CBI  over  conventional  instruction.  The  cited  reports  have 
not,  however,  analyzed  possible  methodological  problems  in  the  reviewed  CBI 
research.  Without  such  analyses  one  is  ill-equipped  to  make  recommendat i  ens 
and  decisions  about  the  use  of  CBI  for  military  training.  Our  discussion 
will  address  a  few  of  the  methodological  considerations  which  seem  -elevant 
m  assessing  the  relative  merits  of  CBI  and  conventional  instruction  as  In¬ 
structional  delivery  systems. 

Methodological  Cons iderat j ons 

The  methodological  considerations  in  CBI  research  affect  conclusions 
about  costs,  attitudes,  and  effectiveness. 

Costs 


Orlansky  (1985)  indicated  that  the  promise  of  CBI  lay  in  the  potential 
for  saving  time  and  money:  millions  of  dollars  could  be  saved  by  reducing 
military  training  time.  Orlansky  provided  the  following  formula  for  comput¬ 
ing  time  savings  attributable  to  CnI:  Fercent  Saving  =  (CBI  Time/Conven- 


tionai 


Time)  x  100.  However,  for  this  formula,  CBI  was  being  compared 


conventional  classroom  instruction,  which  was  not  an  equivalent  med*un  to 
CBI.  Avner,  Moore,  and  Smith  (1980)  have  argued  that  CBI  should  only  be  com¬ 
pared  to  other  self-paced  individualized  Instructional  media,  e.g.,  pro¬ 
grammed  texts.  Their  argument  is  especially  relevant  ior  training  time  data, 
because  CBI  and  otner  forms  of  individualized  instruction  allcv.  che  -tudenls 
to  proceed  a*-  their  ow  i  pace,  while  conventional  instruction  does  not.  lime 
savings  are  thus  ascribable  not  to  CBI,  but  to  sel f-paci ig ,  which  'naracter- 
lzes  nearly  all  modern  instruct ional  innovations.  Time  savings  of  approxi¬ 
mately  *j i  percent  i  or  oofr*  oni  and  programmed  texts  as  compared  to 
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convent  icnal  in  si  metier'.  :  ?vr  in  iVct  l  eon  re  per  ted  when  all  three  media 
have  ceer  simultaneously  “"ested  (Crlansky,  19SC).  Ar.d  programed  texts  are 
ur-jaliV  a  1'racticr.  of  CHI’s  cost  for  initial  ir  r  I  ementation . 

The  costs  ct  alternative  redia  .  ar.  be  exarined,  not  only  by  comparing 
training  ti^e,  but  also  t  estimating  life-cycle  costs.  Kenner-Pi charason  of 
al .  ,  U93L)  nctec,  for  example,  that  the  TC  system  as  compared  to  conven¬ 
tional  instruction  would  o'entuallv  lead  to  $181,CCU  a  year  savings.  Such 
long-term  cost  analyses  of  CPI  systems  versus  other  systems  traditionally 
have  involved  estimating  administrative  costs  and  increases  or  decreases  in 
t. rumber  of  faculty  which  would  accompany  implementation  of  new  programs. 
Such  analyses,  however,  have  provided  only  a  partial  picture  of  long-term 
costs.  Personnel  costs,  for  instance,  also  should  include  expenses  associ¬ 
ated  with  training  teaches  to  be  "cor  puter-experts."  As  Shavelson,  Winkler, 
Stasz  Feitel,  F.csyn,  and  Si  ea  (K<Sii)  observed,  using  a  Cbl  system  in  a  typi¬ 
cal  school  necessitated,  a  staff  development  program  which  trained  teachers  to 
te  ther'jghly  knowledgeable  m  computers.  Clearly,  such  hidden-costs  as 
teacher  training  programs  would  reduce  the  potential  savings  for  CEI. 

lost  estimates  also  should  include  determining  the  reliabilities  of  the 
compared  systems.  System  reliability  is  important  to  measure,  because  re¬ 
pairing  malfunctions  will  cost  morey  in  repairing  the  problem  area  and  in 
training  time  losses.  One  formula  (Frances,  Welling,  &  Levy,  1983)  is  (Fail¬ 
ures  Per  Hour  x  Terminals  Affecteo)/ (Working  Days  by  Terminals  Affected)  x 
lor,  The  formula,  however,  only  provides  a  partial  index  to  the  educational 
and  financial  losses  associated  with  faulty  delivery  systems.  The  index  does 
not  take  into  account  the  potential  problems  of  an  implemented  system.  For 
the  most  part,  evaluations  are  conducted  for  a  prototype  system  which  may  not 
relect  a  system's  reliability  when  implemented.  Data  should  then  be  col¬ 
lected  after  the  sy.tem  is  fielded  and  fully  implemented. 

Evaluators  should  also  collect  data  on  administraf ive  costs  after  the 
system  has  been  fully  implemented.  Orlansky  (1985)  surprisingly  demonstrated 
that  assumptions  about  administrative  costs  associated  with  CBI  ha"  rarely 
been  systematically  analyzed.  For  example,  very  few  CBI  evaluations  have 
examined  the  actual  expenses  ( and  problems)  associated  with  a  CBI  system  icr 
updating  the  instructional  materials  and  for  hiring  the  additional  personnel 
needed  to  operate  and  maintain  the  system's  equipment.  Information  obtained 
when  a  system  is  fully  implemented  is  then  needed  to  compare  a  cual  costs  to 
estimated  costs  and  correspondi  inly  to  determine  the  validity  -u  ertain 
assumptions  about  CBI's  value  for  administrative  purposes. 

A_t_Ut  tides 

ftudeuts'  attitudes  towa* d  :'bT  rr,<ry  be  influenced  sy  instructor  views, 
rlark  and  Leonard  (1ol)  ha-  suggested  that  teachers'  belies  that  rpi  woul<' 
help  trie  edu-  jt  it.  rial  ;r."’i  3...  r,ue  would  o/peet  teat  students  would  have  simi¬ 
lar  positive  views  on  .  ecaur?  students'  a'  r  itudes  toward  an  instruction¬ 

al  //stem  are  ir‘‘lu"’"r/  ay  the4?  ’oa^heis'  attitudes  (King,  1975). 

I  et  eiv'ir !  ng  teacher''  :  * '  ’’“I  needed  to  priv><fe  insights  i  nt  ■■ 

rtudentr'  attitudes.  i  o/o:  / -,er  et  .u.  tin.-  s>.  em  itself  may  dictite  t;.e 
students’  views. 
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Instructor  cttrusi veness  is  relevant  in  evaluating  military  CBI  programs, 
because  instructors  usually  are  involved  in  the  evaluation.  Draxl  and  Aggen 
C951)  described  the  need  tc  give  briefings  to  enlist  military  instructors' 
interest  in,  arc  support  of,  new  instructional  systems.  Such  briefings  may 
have  engendered  positive  attitudes  in  instructors,  while  militating  against 
obtaining  unbiased  reports  from  students. 

fiscrepancies  between  students'  responses  to  questionnaires  on  the  one 
hand,  and  rore  objective  data  on  the  other,  were  found  by  Shlechter  (1985): 
Soldiers  reported  that  using  a  light  pen  was  easy,  when  in  fact  many  of  their 
errors  related  tc  problems  with  using  this  responding  mechanism,  such  as 
holding  the  light  pen  incorrectly.  Additional  efforts  to  compare  self-reports 
with  objective  performance  measures  seen  warranted. 

Shlechter  also  found  that  students  showed  fatigue  while  completing  CBI 
lessens,  tut  did  not  report  such  fatigue  on  the  subjective  evaluation  ques¬ 
tionnaire.  Objective  measures  of  fatigue  are  an  important  issue  in  CBI  re¬ 
search  because  the  human-factors  literature  (see  McVey,  Clauer,  &  Taylor, 
1984)  indicates  that  VDTs  (visual  display  terminals)  nay  be  "visual  discom¬ 
fort  terminals"  with  students'  not  being  able  to  use  certain  types  of  termi¬ 
nals  for  extended  periods  of  time.  Evaluators  must  then  measure  through 
objective  indexes  human  factors  variables  which  may  affect  users'  com- 
f  ort . 

Consideration  also  should  be  given  to  examining  other  factors  which  may 
confound  attitude  reports.  Crlansky  and  String  (1981),  for  example,  reported 
that  half  the  courses  which  they  reviewed  lasted  one  week  or  less.  Such 
limited  duration  might  not  be  adequate  for  ascertaining  students'  attitudes 
toward  a  CBI  system.  King  (1975),  and  Clark  (1985)  have  noted  that  most 
students  initially  enjoyed  working  with  a  CBI  program;  however,  questions 
remain  about  a  system's  ability  to  sustain  students'  motivation  and  enthusi¬ 
asm  once  the  novelty  has  worn  off. 

F f fectiveness 


Objective  comparisons  of  CBI  and  other  media  require  that  the  compared 
groups  be  treated  identically  in  every  respect  except  those  under  investiga¬ 
tion.  Instructional  content,  for  example,  should  be  the  same  for  the  com¬ 
pared  groups,  as  it  was  in  Morrison  and  Witmer's  (1983)  comparison  of 
computer-based  and  print-based  job  aids.  Other  media  comparisons  did  not,  as 
noted  by  Clark  (1985),  deal  with  the  issue  of  matching  content.  For  in¬ 
stance,  ETS's  often  cited  evaluations  of  PLATO  and  TICCIT  (Alderman,  Appel,  & 
Murphy.  1978)  did  not  include  a  specific  comparison  of  course  materials  pre¬ 
sented  in  the  compared  classroom.  Any  differences  in  ETS's  evaluations  might 
have  been  due  to  presenting  different  content  and  not  to  any  features  inher¬ 
ent  in  the  CBI  system.  Evaluators  cannot  make  any  claims  about  the  relative 
effectiveness  of  CBI  as  as  instructional  delivery  system  unless  unconfounded 
comparisons  are  made  with  other  media. 

I'noonfounded  comparisons  between  CBI  and  other  media  also  require  that 
both  educational  programs  be  created  with  the  same  degree  of  effort.  Stone 
(1985)  described  the  extensive  effort  involved  in  developing  an  in-house  set 
of  CBI  lessons,  he  showed  that  developing  tins  cour.  aware  involved  a  team  of 
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Amy  Captains,  civilian  educational  specialists,  and  outside  consultants 
worxing  for  a  year.  What  typical  Army  Instructors  go  to  such  an  effort  to 
develop  their  daily  lesson  plans?  CEI  evaluators  must  then  taKe  into  con¬ 
sideration  such  differences  in  instructional  efforts  when  discussing  their 
evaluation  results.  If  differences  between  CBI  and  conventional  instructions 
only  revolve  around  this  issue  of  instructional  effort,  then  the  Army  should 
consider  putting  forth  similar  efforts  in  creating  instructional  plans  to  be 
used  by  classroom  teachers. 

CBI  evaluators  should  also  determine  a  system's  instructional  efficiency 
by  its  ability  to  help  the  student  accomplish  different  levels  of  transfer. 
Clark  and  Voogel  (in  press)  suggested  that  CBI  was  usually  geared  for  immedi¬ 
ate  transfer  and  frequently  ignorec  the  skills  and  strategies  necessary  for 
long-term  retention.  They  also  suggested  that  CBI  courseware  was  not  geared 
for  cognitively  orien*-°d  tasks,  such  as  problem  solving.  These  trends  are 
disturbing  to  educators,  who  place  a  preminum  on  long-term  mastery  of  infor¬ 
mation  and  the  ability  to  1  "lp  students  develop  problem-solving  skills.  Un¬ 
fortunately,  very  few  CEI  evaluations  have  exanined  the  long-term  and 
cognitive  impact  associated  with  this  medium. 


The  apparent  effectiveness  of  CBI  may  also  be  an  artifact  of  unwarranted 
instructional  prompting.  Clark  and  Leonard,  after  reviewing  42  randomly  se¬ 
lected  civilian  CBI  programs,  found  that  teachers  usually  provided  CBI  groups 
with  more  instructions,  e.g.  prompts,  to  complete  the  tasks  than  they  pro¬ 
vided  for  control  students.  The  extent  to  which  prompting  is  a  problem  de¬ 
pends,  of  course,  on  the  observed  amount  of  prompting  relative  to  the  amount 
intended  by  the  system's  designers.  Unwarranted  prompting  should  then  be 
another  variable  measured  in  CBI  research. 

Summary  and  Conclusions 

In  summary,  the  following  methodological  considerations  have  been 
discussed  regarding  CBI  research:  1)  make  unconfounded  comparisons  between 
the  CBI  system  and  other  appropriate  educational  media;  2)  measure  hidden 
life-cycle  costs  associated  with  the  delivery  system;  3)  determine  the  sys¬ 
tem's  reliability;  4)  measure  actual  life-cycle  and  reliability  costs  for  an 
implemented  system;  5)  determine  teachers'  attitudes  toward  the  CBI  system; 

6)  control  for  possible  "instructor  obtrusi vene >s"  effects;  7)  substantiate 
subjective  evaluations!  data  with  more  objective  measures;  8)  measure  human 
factors  variables  with  objective  indexes;  9)  control  for  possible 
confoundings  due  to  insufficient  testing  duration;  10)  make  sure  that  the 
compared  media  have  the  same  content;  11)  control  for  differences  in  instruc¬ 
tional  efforts;  12)  examine  students'  long-term  and  cognitive  mastery  of  the 
l  formation;  and  13)  measure  unwarranted  prompting.  Other  methodological 
issues  which  cannot  be  described  in  this  paper  involve  the  possible  interac¬ 
tions  which  nav  exist  between  student  characteristics  and  the  experimental 
treatment.  As  argued  throughoi  c  this  paper,  clearer  answers  about  CBI's 
inherent  value  as  a  delivery  system  can  be  obtained  if  these  considerations 
are  incorporated  into  the  evaluation  process.  One  cannot  conclude  that  CBI 
is  tne  superior  educational  medium  when  confounded  comparisons  are  made  with 
inappropriate  media. 
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These  methodological  problems  reflect  the  complexities  in  CDI  research. 

To  the  extent  that  CBI  research  continues  to  compare  CBI  to  other  instruc¬ 
tional  redia,  then  some  time-consuming  and  expensive  research  procedures  must 
be  employed.  For  or.e  thing,  both  cross-sect ior.al  ar.d  longitudinal  aata 
should  be  collected.  A  cross-sectional  design  is  needed  for  initial  assess¬ 
ments  of  students'  CBI  performance,  while  Icngitucinal  data  are  necessary  to 
ascertain  tr.e  long-term  learning  associated  with  the  CBI  system,  .‘ccondly, 
systematic  programs  cf  research  are  needed  in  which  priorities  are  assigned 
to  independent  variables  anc  variables  are  systematically  manipulated  arid 
measured  in  successive  experiments.  Such  research  programs  are  imperative  to 
analyze  systematically  as  many  of  the  previously  cited  methodological  consid¬ 
erations  as  possible.  Systematic  analyses  would  provide,  for  example,  infor¬ 
mation  about  the  relationship  between  estimated  and  actual  CBI  life-cycle 
costs.  Programmatic  researches  needed  to  provide  further  insights  into 
whether  CBI  problems  are  due  to  courseware  or  hardware  limitations. 

Evaluators  shoulu  perhaps  shift  focus  from  questions  of  inherent  superi¬ 
ority  to  the  identification  of  the  conditions  under  which  CBI  and  alternative 
mpdia  produce  and  do  not  produce  desired  results.  Various  media  have  various 
strengths  which  must  be  first  enumerated  and  then  matched  with  intended  in¬ 
structional  settings  objectives,  and  resources.  Zemke's  098*0  conclusion 
that  CBI  may  best  be  used  as  a  supplement  to  classroom  instruction  is  a  case 
in  point. 

Military  ec^cators  should  only  begin  widespread  implementation  of  CBI 
aftem  clearer  answers  are  provided  about  this  medium's  instructional  and 
financial  value.  History  has  shewn  that  educational  innovations  which  were 
implemented  without  sufficient  research  and  planning  were  alwavs  abandoned 
for  later  technological  innovations  (Montague  &  Wulfeck,  1989).  There  is 
currently  some  evidence  that  this  abandonment  process  is  beginning  to  occur 
for  some  CPI  programs  (D  Reed,  Personal  Communications,  6  November  1985). 

With  sizeable  financial  and  personnel  investments  associated  with  large-scale 
CBI  implementation,  the  military  can  ill  afford  to  continue  this  historical 
process.  This  abandonment  process  would  also  be  unfortunate  because  comput- 
ers--if  used  properly — could  be  a  valuable  instructional  tool. 
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Abstract 

rin-  *  fTeotiv*  m  ss  of  computer-!  .u-<  >1  graphics  »,  training  a  visual  recognition  skill  (radar 
jainniing)  w.is  investigated  in  rt cognizing  jamming.  Naval  personnel  must  differentiate  between 
stMral  t>pis  and  tlnnr  t  mi  i!  <  haracteristics  Computer  graphics  were  developed  to  enhance  the 
b  times  of  iarh  t\pe  of  i.tmming  The  graphics  were  accompanied  b>  captions  winch  gave  'In- 
title  and  a  on* -si  ntenre  description  An  experiment  was  conducted  to  sv stematically  lest  the 
t  !ft  t  is  of  graphit  s  Subjects  wt  rt  students  at  t lie  Elect  Combat  Training  Center,  assigned  to  one 
of  s|\  coiiditit.ns  in  a  2x3  dt'ign  Subjects  receded  training  on  animated  graphics,  with  or 
without  caption'  'till  graphics,  with  or  without  captions,  or  captions  on!) .  or  were  assigned  to  a 
ontrol  group  Suhjei  ts  were  then  tested  using  videotapes  of  n-tual  jamming  Test  results  are 
disi  usse.i  in  t « • .  i'  of  the  relative  merits  of  animated  versus  still  graphics,  and  the  value  of  cap¬ 
tioning 


Int  roduction 

The  use  of  graphns  to  enhance  the  learning  process  has  been  of  interest  to  the  educational 
'  oiiimunit v  foi  veals  Resear<  h  oil  the  use  of  graphics  and  text  illustrations  has  shown  that  they 
i  an  improve  learning  by  (1)  helping  students  understand  vvhat  they  have  read,  (2)  having  pictures 
substitute  for  word'  and  (3)  enhancing  learner  enjoyment  Thev  can  also  increase  retention  of 
information  bj  having  more  than  one  wav  of  encoding  information  (i  e  .  pictures  plus  words]  and 
through  repetition  (Levie  <V  Lentz,  1982) 

With  the  advent  of  computer  based  training,  graphics  have  been  an  even  more  interesting 
a'pe*  t  of  instruction  since  thev  can  be  made  to  move,  flash,  and  present  in  color  However,  the 
<  v  id*  m  i'  on  how  i  ritual  thev  ar<‘  to  the  success  of  the  instruction  is  mixed  Sun  e  thev  increase 
d*  v i  lopnn  nt  nisi  'ignitii  antl.v .  whether  thev  are  necessary  and  tin-  wav  in  which  they  are  used  is 
of  importance  (\lcx>rc  Nawto.  ki  X  Sun ut is  1979)  It  is  likely  that  the  importance  of  graphics 
depen  Is  largely  oil  the  task  >  he  learned 

In  the  prisent  study,  graphics  were  a  logical  <\!ension  of  the  instruction,  the  purpose  of 
wiinh  w.n  to  train  visual  recognition  skills  The  graphns  were  designed  to  offer  the  students  a 
'*ii*'  of  simplified  visual  patterns  to  Inlp  organize  the  learning  of  impoitant  visual  *  haracteris- 
1 1 *  s  in  the  pres*  nt  experiment,  two  aspects  of  gnphns  were  explored  in  addition  to  the 
*  Ife*  liv  *  ness  of  th,.  graphic'  hi  g*  total  I  list,  the  vain*'  of  animation  v  as  addiesse*!  and  second, 
the  value  of  captionur'  was  explored 


Background 

The  present  exp*  rimeiit  was  de  agneii  to  investigate  the  influence  of  computer-based  graph¬ 
ics  on  iccognition  of  visual  patterns  Tin-  training  and  testing  of  visual  recognition  skills  in  Naval 


personnel  has  been  of  longstanding  importance  A  critical  skill  in  l lie  Naay  is  t lie  recognition  of 
i  lei  iioinr  countermeasures  jfX'M)  a  No  knows  .is  "jamming”,  on  radar  systems  In  recognizing 
jamming  seaeral  januning  types  and  the-ir  aisual  <  harncteri'Ucs  mii'l  be  dilfercntiated  NI’RDO 
d<  a  e !o[>i  d  a  training  program  to  t.  a-,  li  tin  recognition  of  11  <litf>-r*  nt  t y  pes  of  jannning  Tins  pro¬ 
gram  us‘  i  coiuput‘-r-li.i.'t -I  instruction  combined  with  aideotapes  of  jamming  The  cumpulei- 
l-.u'i ' i  training  pr>srnt.d  tin-  iii'trintion  ami  aNo  presented  graphics  0f  the  jamming  types  to 
•  nliancc  tin-  \  |su  il  ,  haract eri't ics  1'hi'  training  program  was  extremely  successful  (see  McDonald 
\  Cl  iwfor  1  1‘N.l,  1<ix-,| 

1  he  com]  uter-hased  instruction  lor  recognition  of  jamming  was  designed  to  focus  on 
■  ategorical  features  of  the  dilN  rein  types  of  jamming  Bach  tape  of  jamming  lias  critical  usual 
features  winch  are  unnjue  and  whnii  are  important  in  drilling  and  identifying  the  tape  In  real 
life  hoaatair.  jatiitnnig  c..n  he  a  era  difficult  to  identify  because  the  patterns  are  Complex  and  can 
i  e  ambiguous  with  re'pe<  t  to  the  critical  featurrs  In  training,  these  features  were  dilftcult  to 
di'j  lay  in  teal-life  aideotape  pre  entations  As  a  result,  the  a  irh-otape>  by  themselaes  w-re  not 
'ulfuieiit  to  train  students  to  identifa  each  tape  of  jamming  Because  of  this,  grai  hies  aaere  used 
to  highligti’  tin-  critical  features  These  graphics  were  "cartoon-like'  renditions,  making  the 
features  of  ra<.  h  jamming  tape  a  era  dear  and  exaggerated  The  literature  on  concept  learning  pro- 
a  i  1.  s  ca  id*  nee  that  in  ordt  r  to  teach  students  to  recognize  the  prototypical  concept,  lots  of  exam¬ 
ples  mu't  be  giaen  and  piotota  pical  features  must  be  emphasized  (Gagne  &•  Briggs,  1979)  It 
'•as  the  purpose  of  the  present  experiment  to  test  the  use  of  the  graphics  as  "feature  enhancers" 
since  i hex  were  considered  extremely  helpful  in  teaching  the  students  by  helping  the  students 
attend  to  the  features  critical  to  identification  of  each  tape  Three  questions  were  of  interest  m 
the  piesent  exp*  rim-nt  (1)  were  the  graphn  s  effectiae  in  teaching  the  recognition  of  different 
tapes  of  jamming  (2)  were  animated  giaplms  mure  (fhcttu  than  still  graphites,  and  (3)  aaou’j 
<  aptions  ('tiort  aerbal  descnptions  of  the  jamming  tapes)  add  to  the  aalue  of  the  graphics1 

In  order  to  te't  the  -dects  of  graphics  without  the  confounding  influence  of  the  accompany¬ 
ing  iiistmction  the  expert  mnt  was  conducted  prior  to  the  start  of  the  instruction  For  experi¬ 
mental  pmpos"  the  pretest  to  the  instruction  (winch  consisted  of  a  aideotape  with  examples  of 
it  lual  jamming),  si  iaed  as  the  dependent  aanable  In  the  experiment,  there  wete  5  graphics  con¬ 
ditions  used  ami  tin-  students  [ireaiewed  the  graphics  before  they  took  the  pretest  and  went  on  to 
the  instruction  The  fiae  conditions  used  vaere  animated  graphics,  with  or  without  captions,  «till 
graphics  a  it  li  or  aaithout  captions,  ami  captions  alone  These  fiae  conditions  were  compared  to  a 
control  group  that  did  not  receive  ana  pre-instruction  fhe  use  of  these  particular  conditions  per¬ 
mitted  fa  ablation  of  whether  or  not  graphics  were  helpful  and  also  aaere  designed  to  tease  out 
exactly  which  aspect  of  graphics  use  was  the  most  helpful 

Method 

Subjects 

Subjects  were  111  in, ile  enlisted  \aa al  personnel,  attending  FC'CM  Gasses  and  Adaanced 
Wat  fare  classes  at  I'CTCl’W  Their  Naay  rates  included  Operations  Spicinh'ts  (OS),  Fire  Con¬ 
trol  Technicians  (I'll  Fleet romc  Technicians  (F'F)  and  FlectronK  Warfare  Technicians  (F\V) 
These  personnel  ranged  in  rank  from  F3N  to  F7's 

\  control  group  aaas  formed  from  pieaiously  collected  cl.ua  in  which  students  attending  the 
same  course  aaere  giaen  the  training  a*  a  whole  package  ami  did  not  receiae  the  graphic  >  as  a  pre- 
a  n  as  c  ond it  ion  93  st  ud  aits  comprised  t  he  <  out  rol  gtoup 
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Experimental  Materials 

l"o  (>(>••-  of  graplm  s  were  ii'i  d  within  tin-  c  ompiUri-l>  ist  d  ni-trin  Hull  ..ini  wire  tiui.foic 
I- '(>■'!  in  tin  •  xpemm  nt  In  t!:>-  i.  in. it.-d  iii'tiin  tarn  tin-  graphics  u-rd  vw-ie  -vt ill  gr  il  lin'-  with 

■  ij  tioii-  Tin  t  iptioiis  had  ,i  t it !*■  nf  tin-  j  tunning  ami  a  short  »!■  si  rptioi,  of  wL.it  tin  jamming 
look"l  hk<-  In  l!n-  other  [  ait  -'f  tin-  *  lining  tin-  graphics  wire  animated  with  caption-'  Tin- 
'  iptn  li'  no  1  w  *  i  *-  the  'inn-  a'  ii'— i  in  tin*  still  graplm  s  In  oi<Lr  to  t'-.L-i'  out  tin-  •  out  rilmt  ling 
fntor--  to  tin  triplin'  t  iT<  1 1  s  •  m  "  tine  w-n-  live  graplm-'  <  on-lit n  iis  m  tin-  experiment  Tina 

If  .t'  f'-lll  Ws 

a  I  x  1  11.1.  ( *R  M  ‘1  lit  x  Willi  t  \1  ’  1  K  two  i  otn  put  <  r  frames  of  each  t  \  pc  of  jamming 

ill  'tp-s)  wiih  a  \  i'ii  il  e  pr*  'dilation  (giaplm)  of  ili<-  jamming  tv  pc  I  mlcrncatli  tin 
grip  in  w  tin-  t;il'-  of  -In-  jamming  am!  m-iImI  description  of  aj.pio\iu>.  tt  i-l\  1  sentence 

I. )  \M\I  VIED  (.RMMIK"  WITH  C\l''I|()\s  Two  fi  aim-  of  <*.n  Ii  t \  pi*  of  jamming  m 
win  li  tin-  giaplm'  of  tin  j  miming  tv  pc  was  animated  (it*  tin*  pictue  presented  a  tinning 

J. i mining  t  \  pc  on  tin*  'imul.it'  1  i  a*lai  v  opr)  1  min  m  ath  tin-  graplm  w  e  a  Tit  h  ami  a  \ «  r- 
I- 1!  •!<  '-  nplioii 

•  i 's  i 11 1.  ( .ii  mmik"'  w  mi  titi.i:  \\i>  no  c  \ptio\s  ri„.  graphn-s  ui  this  romlition 
wire  tin-  s  mu-  i|sc.l  in  the  x  [  11.1  (iRAf’liK''  Willi  (  M’TIONS  des<  itlx-'l  al.o\c  except 
'hat  timl'-rm  ath  tin-  giaplm-  w  is  tin-  title  onlv  ami  no  \erhal  description 

d)  \M\!  VIE!)  (JR  M-llIC"  W  n  il  TITLE  \ND  NO  (OPTIONS  The  graphic*  m  this  ren¬ 
dition  w ci »■  the  same  used  in  tin-  ANI\|\TEI)  (JRAPIIK'S  WITH  CAPTIONS  except  that 
tmd'-im-  ith  tin-  gi  tphtr  mi-  tin-  title  onlv  and  no  \erhal  description 

.)  1\M\1I\C  Till  I>  W  nil  C\ITlO\N  AM>  NO  (JR  \PHIOS  in  this  condition  which 
s<  r\  i  d  .is  a  ‘'ontiol'  to  tin-  graplm  s  (omlitions  there  were  no  graphics  presented,  onh  the 
I  limning  titles  and  % «-i i.-»l  des,  npiioiis  of  e  n  h  tv  p-  of  j  imm.ng  weie  presented 

Experimental  Procedure 

The  dita  f< -r  ’lie  graphics  i\penm«nt  was  (.dieted  from  Ma\.  I9xd  to  June  lf)S|  The 
op'iMi.riit  was  ii'  ici'-d  so  ih  it  the  pre\  lew  i  otidit ion  would  “tit"  onto  the  front  end  of  regular 
'  out 'c  materials  and  (oinputer-  h  ised  ni'timtion  1  (  \1  and  Vhanrcd  W  arfare  classes  were  used 
I"!  I  his  experiment  III' '(•  class,.'  wc|e  e.u  h  '(  liedlih  d  oil.' e  a  month  or  I\e|\  two  months  Path 

■  1  iss  wits  comprised  of  x-10  s|  mh  iit s  Suhje.  is  from  ea<  h  ( la."  were  landoml)  a.'signcd  to  one  of 
the  live  pi  e\  lew  (olidltloll'  The  s|i|,]"iits  took  the  preview  condition  on  one  da)  After  a  one- 
< i  i x  del. iv  student'  in  all  (onditioiis  took  the  pretest  \fter  the  pretest  thev  proceeded  to  eom- 
ph  > t  he  i  om|'iit’  i-ha'i  «1  1 1  .lining  (w  hi.  Ii  m  not  of  c  om  ci  n  in  t  hi'  papet ) 

Dependent,  Measures 

I  1|«  dependent  measllles  li'fd  111  the  expel  1 1 1 1<  lit  Were  the  stores  obtained  oil  the  ptetest 
used  hi  the  >  onij'iit >  i - L  ised  i ii 't  Mu  i  ion  1  Ii*-  |  1 1 1 1  'I  i  ole ist (  |  of  v  id>  ot ,ip«  d  i  \a m pies  of  ac  t mil 
jamming  Their  were  'J  \idiotaj"'d  exampl's  of  ea<  Ii  tv  p<-  of  jamming  (22  items)  In  addition. 
'In  i  •  sponsi  mie  foi  jet  ogn  it  ion  of  t-uli  example  of  jamming  was  measiiied  in  seconds  I  mill), 
loi  I  x  pi  i  mu  nt  a  I  g  i  otips  total  t  ini'-  sp(  nt  in  t  he  pit  v  lew  (  olid  it  ion  w  as  measured  (in  minutes) 

Equipment 

1  he  inn  i  opt  or  rssoi  us(  d  for  t  he  tinning  was  the  Tl  ,R.\h .  an  INI- 1  1  Laser  I  dual  lloppv  disk 
dnve  s\'(em  wi'h  a  .$2  lx  woik  ni'-morv  (.ipalnlit)  It  iii>  Imled  a  Kevho.tid  and  CRT  display  for 
tin  pi  (sent  at  ion  of  hla<  k  white  giaplm  '  ami  t(\t  The  (  oinputer  was  used  foi  pi<  sentation  c.f 
lli'trU'Uon  and  giaplm'.  foi  presentation  of  tests  and  test  t'-siilts  and  for  data  collection  The 
video  equipment  used  for  videotape  pi  (  sent  at  ions  was  a  ll<  t.im.ix  videotape  pla)c-r  and  TV  11101)1- 
loi 
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Table  3  present's  the  total  time  Hu  minutes)  spent  a<(uill\  prev tewing  the  graphic's  for  the 
e\pei  imcntal  groups  \n  analysis  of \aiiame  r<  \  e,sl«‘.{  sigmfn  ant  difference;.  lie'  ween  experimen- 
tal  gioups,  |(t  1  ()<) )  l'iOO  p  001  \  subsequent  te~t  for  ddh  reiuc.s  revealed  that  the  animated 
graphics  group  with  captions  (group  J)  'pent  mote  tnm  viewing  the  graphics  pnor  to  taking  the 

pretest 
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Conclusions  and  Discussion 

Tin  i e  <jue- 1 ions  were  of  mtee' t  m  the  pte.-ent  c  xpersmi  nt  !  i;  •! .  the  ov era!!  <  lb  <  tiv  cm  s.s  of  gi aph- 
i  '  m  t  e  iching  the  skill  of  jamming  recognition  w  as  explore.  I  ‘second,  the  v  alue  of  animat  ion  w  as  t(  sted 
and  final!)  the  Use  of  captions  to  aicomp  tliv  till  gluphns  vva.s  addlessed  'I'll**  lesuits  of  the  expeiiment 
nvede.i  that  graphics  were  helpful  and  spe<  ill •  inhumation  about  the  use  of  animation  and  captions  was 
plov  l  ied  as  Well 

llii  gi aplin '  groups  outperformed  the  control  gioups  on  (lie  pretest,  so  even  though  they  had  not 
i  n  actual  jamming  before,  the  grathn  provided  them  vvith  enough  information  to  recognize  actual 
install,  es  of  the  jamming  A  more  d, ’ail'd  look  ai  the  results  showed  that  it  was  the  animated  with  oap- 
t i.>iis  gioup  wludi  accounted  for  the  significant  differences  This  finding  suggests  the  importance  of  both 
.tin ui'it t> >u  and  laptions  Inteifuetation  of  ibis  finding  must  be  made  with  caution,  however,  because  of 
’I."  Ini  that  I  In  I  e  vv  a.s  sum.  simil )!  it  y  b<  t  w  <  •  n  lie  animated  .  mid  it  imi  and  t  lie  test  .  ondil  ion  (thev  vv  etc 
I"  ili  ne  iv  mg  i>i>  Ini*  ')  G  apt  loll'  app.  ai .  d  l>>  ad  I  in  I  lie  value  u|  the  animate  d  r  ondil  ton  alt  hough  t  here 
vviie  no  main  db'ls  f,n  e.iptmns  'I  he  captions  mav  have  piovul  d  the  animated  gioup  vvph  a  more 
■  1 1 1  •  i  •  n  t  wav  to  pio.  ess  and  let  tin  t  lie  pi  i  v  n  vv  m  hum.  it  mu 

\nal_v  s|s  of  i  lie  i  line  spent  pr<  viewing  tin  gi  aplin  s  provide  s  another  1 1  ue  to  the  siipei  ioi  performance 
of  the  am iii  ih  I  with  .  aptioiis  gioup  Thi'  gioup  spi  nt  nmic  tunc  in  previewing  the  graphics  than  did  the 
oi  In  i  gi  aplin  s  groups  Hue  again,  tin  .  oinbin  it  ion  of  animat  mu  with  i  apt  ions  mav  hav  e  given  this  group 
null  infoi  mat  ion  to  pi  oi  ess  t  h<  ii  b_v  a.  i  ou  ii  l  mg  for  nmi  e  i  mu  'pent  In  terms  of  a<  t  mil  response  time  m 
I  lie  list  1 1  si  If  t||e|.  Were  no  Significant  dllh  ||  Ik  I  s  bi  twe  ell  grc>ll|>s 

d  ill  It'slllt'  of  till'  expeiiment  plovide  some  mteiestmg  guidelines  fo;  tile  Use  of  giapllics  Filst, 
giaphiis  appeal  lo  be  a  useful  way  of  giving  student'  advance  infoimitmn  about  visuil  c  haraeteristu  s  of 
i  omplex  patterns  The  findings  of  t his  st iid v  do  suggest  that  it  is  important  to  consider  the  characteristics 
of  the  visual  pal  ■  >  i  n  to  be  learnui  m  developing  the  gi  aplin  s  In  our  <  use,  simplifying  the  visual  pattern 


w linpoitaiit  However  mi liuliug  the  movement  through  animation  turned  out  to  be  critical  even 
though  it  did  add  Mime  complexity  Finally.  the  use  of  captions  cannot  be  trumpeted  based  on  this  expei- 
iiii'  nt  Ian  ili«-  bindings  do  suggest  that  they  added  ',>  the  \alue  of  the  animated  graplucs 
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Recent  Developments  m  Job  Evaluation  Research 
Stanley  D.  Stephenson 
School  of  Business 
Southwest  Texas  State  University 

San  Marcos,  Texas  78666  j 

Vb  evaluation  is  widely  used  m  establish  a  specific  salary 

•  '  a  specific  position.  The  point  system  is  the  most  widely  used 
u  evaluation  procedure.  In  this  system,  one  or  more  factois 

:  '  u --ed  >o  tate  each  mb.  Research  has  typically  yielded  four 
kill,  Effort,  Responsibility,  and  Working  Conditions, 
nr.  •  ;  these  factois  can  have  one  oi  more  subfactors,  or  scab's, 
m.ample,  under  the  Skill  factor  Fraser,  Cionshaw,  and  Alex¬ 
ander  (1984)  used  four  scales:  Education,  Experience,  Accuracy, 
in  i  '  M.piexity.  Within  each  factci  (oi  scale)  there  are  several 
.■'-■els  of  worth;  each  level  is  assigned  a  certain  number  of 
points.  Judges  (typically  experienced  employees)  th-~n  rate  eacn 
factor-,  usually  based  on  information  contained  m  a  job  descrip- 

•  rrn,  and  assign  the  established  number  of  points.  The  total 
number  of  points  across  factors  equals  the  job's  worth  to  the 

’imitation.  Ail  jobs  are  then  ranked  according  to  their  point 

•  'tui,  and  wages  are  set. 

in  spite  of  the  wide  spread  use  of  30b  evaluation  to  deter¬ 
mine  the  salary  paid  to  an  employee,  recent  research  has  been 
imited,  However,  in  the  late  1970s  job  evaluation  m  general 
'  ime  under  somewhat  of  an  attack,  and  consequently  some  research 
!  as  boon  conducted  m  the  last  several  years. 

This  research  was  generated  by  critical  comments  concerning 
-no  -iso  of  job  evaluation  for  determining  wages.  For  instance, 

'■no  major  report  (Treiman  &  Hartmann,  1981)  concluded  that  the 
reliability  of  job  evaluation  methods  for  determining  wage  levels 
:  is  not  been  established.  T’reiman  (  1979)  noted  that  there  have 
icon  no  studies  of  the  validity  of  job  descriptions  which  in 
tneoiy  form  the  basis  for  any  job  evaluation  system.  The  real 
■■pact  of  those  criticisms  is  that  there  may  now  Ke  a  "perceived” 
k  of  credibility  concerning  job  evaluation. 

i<‘  specifically,  job  evaluation  methods  came  under  closer 
’  my  due  to  the  comparable  worth  movement  which  is  rapidly 
;  ’  •-  "'in '7  a  dominant  personnel  issue  of  the  1980s.  Comparable 
..  .  ’  h  refers  to  r  eceiving  equal  pay  for  equal  job  value.  Propo- 
‘  of  this  movement  argue  that  the  historical  male-female  wage 

:  '.‘orer.ee  m  due  primarily  to  a  bias  inherent  in  the  job  evalu- 
o  rr.mess.  Consequently,  comparable  worth  advocates  are 
i  "ir.j  for  the  use  of  job  evaluation  methods  that  can  objec- 

•  "oly  assess  differences  m  basic  job  factors,  a  measurement 
t'tiity  that  may  not  exist  m  current  job  evaluation  techniques. 

I’he  ’hief  criticism  of  job  evaluation  is  that  it  is  inher- 
v  iy  judgmental  and  therefore  possibly  biased.  Bias  can  enter 
*  he  process  at  two  points:  in  the  writing  of  the  job 
‘  emptier,  and  in  the  evaluation  cf  the  job  description  with 
•’  ;  i-'t  to  the  factor  s  •' scales  selected.  In  other  words,  the 
0.1. mg  of  mbs  and  the  writing  of  the  mb  description  are 
\>IIy  subjective  events. 

I  he  sum  of  the  tecent  questioning  or  job  evaluation  tech- 
■  :  iuf s  has  been  the  publication  of  several  research  studies 
’■  'rgn-vd  t  <  must  some  of  the  underlying  assumptions,  and  recent 
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criticisms  of  bias  m  job  evaluation.  This  paper  attempts  tc 
summarize  the  findings  of  these  articles. 

Rater  Reliability 

Treiman  (1979)  has  stated  that  "the  xeliabil:  y  :f  tarings  is 
not  pa  i  *-  icuiai  ly  encouraging"  (  p.  40).  Doverspike,  Carl,  si, 
Barrett,  and  Alexander  (1983)  conducted  their  own  review  if  the 
literature  and  found,  contrary  to  Treiman,  rather  ugh  mterrater 
reliability  across  studies.  They  then  had  10  raters  rate  20  job 
inscriptions,  for  'tbs  ranging  from  clerk  to  accountant  to  sales 
■11  us:  "".dent  to  supervisor,  on  a  point  system  job  evaluation 
: nst lament.  The  instrument  had  four  factors  (skml,  effort, 
i expansibility,  and  working  conditions)  and  a  total  of  11  scales. 

They  used  a  relatively  new  analytical  technique,  geneializa- 
oility  analysis.  This  procedure  uses  a  landom  effects  ANQ7A 
design  and  permits  the  analysis  oi  each  potential  source  of  error 
*  hat  may  affect  job  ratings.  In  this  study,  three  facets  were 
analyzed:  jobs,  scales,  and  raters.  The  authors  found  that  the 
: ater  factor  and  its  interactions  with  both  scales  and  jobs 
, i oduced  little  variance.  With  adequate  training,  sufficient  job 
information,  and  a  properly  designed  point  job  evaluation 
system,  the  job  scores  produced  by  the  10  raters  yielded  adequate 
levels  of  reliability.  Moreover,  "reliability  dropped  only 
slightly  when  the  number  of  raters,  assumed  to  be  from  the 
universe  of  trained  raters,  was  reduced  from  10  to  l"(p.  481). 
These  results  certain^/  do  not  agree  with  Treiman 's  (1979) 
'■onciusion;  they  also  question  the  need  for  the  usual  recom¬ 
mendation  to  have  a  minimum  of  10  raters.  Others  (e.  g. ,  Fraser 
et  ai,  1984;  Doverspike  &  Barrett,  1984;  Madigan,  1985;  and 
Stephenson,  1985)  produced  similar  results.  Also,  on  a  related 
dimension,  neither  sex  of  the  job  incumbent  (Schwab  &  Grams, 

I'M:*)  noi  sex  of  the  job  evaluator  (Doverspike  et  al  ,  1983; 

Dover  spike  &  Barrett,  1984)  appear  to  inriuence  final  job  scores. 

In  sum,  it  appears  that  job  evaluation  raters  contribute  very 
little  variance  to  the  job  evaluation  procedure.  If  properly 
•rained  and  adequately  informed  with  respect  to  the  jobs  being 
-  m ’mated,  inters  can  and  do  reach  reliable  consensus  about  the 
w.'i’h  of  jobs.  Moreover,  this  consensus  seems  to  be  obtainable 
with  -os  little  as  three  raters.  Consequently,  on  the  surface 
■  uers  do  not  seem  to  have  a  biasing  impact  on  job  evaluation. 

The  term,  on  the  surface,  was  used  to  preface  a  finding  re¬ 
puted  by  Madigan  (x985).  As  noted,  he  found  rater  reliabilities 
jf  at  least  .«r.  However,  he  also  reported  that  the  lowest 
'tandard  error  of  measurement  was  +■  40  points  at  r  -  .90.  "Hense 
the  95°o  confidence  interval  range  of  160  points  encompassed  four 
; ossicle  eiassrf icaticn  assignments"  (p.145).  Doverspike  et  al 
(i,-,8i)  aiso  reported  large  confidence  intervals  even  with  high 
correlations.  Error  variances  this  large  might  be  un- 
io  '<  ptable  for  comparable  worth  evaluation. 

Madigan  (1985)  also  reported  that  m  the  best  of  three  job 
•a iiiation  methods  he  found  classification  level  agreement  in 
n.y  oi  of  120  mterrater  comparisons.  Moreover,  he  reported 
■ !  u'sif ication  differences  of  two  or  more  pay  grades  in  11%  of 
‘he  rases.  Gomez-Mejia,  Page,  and  Tornow  (1982)  suggested  a  need 
examine  evaluation  hit  rates'  (the  percent  of  cases  for  whom 
*h<-  estimated  and  actual  grades  were  the  same)  as  well  as  the 
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Additional  interrater  correlational  results.  Operationally, 
these  authors  considered  a  position  a  hit'  if  it  was  classified 
by  the  job  evaluation  system  within  +  1  grade  of  its  assigned 
grade.  They  noted  that  the  relationship  between  correlations 
and  hit  rates  is  uncertain;  a  correlation  of  .82  between  pre- 
d:  'ted  and  actual  grade  may  have  a  lower  hit  rate  than  a 
correlation  of  .62.  In  their  best  job  evaluation  method,  +  0  hit 
rates  occurred  in  40  percent  or  fewer  of  the  cases. 

Consequently,  even  though  rater  reliabilities  may  be  accep- 
A  ie,  this  measure  may  not  be  capturing  the  entire  picture.  Even 
vncn  coi relations  are  m  the  .90  range,  the  percent  of  actual 
n its  may  be  unacceptable  for  actual  classification  purposes. 
Madigan  (1985)  expressed  the  belief  that  assessment  of  potential 
eiio:  '-ariarce  m  evaluation  measures  must  go  beyond  traditional 
measures  to  include  an  analysis  Oi.  impact  on  pay  or  classifica¬ 
tion  decisions.  He  also  expressed  the  need  to  establish  accep- 
r  able  error  intervals  for  determining  wages. 

Madigan  (1985)  summarized,  "The  psychometric  adequacy  of  job 
worth  measures  generated  by  point,  guide  chart,  and  PAQ  [Position 
Ana lysis  Questionnaire]  evaluation  methods  is  open  to  serious 
juestion.  Results  of  this  investigation  indicated  that  previous 
-Audios  of  job  evaluation  understated  potential  inconsistency  in 
classification  decision  making  attributable  to  measurement 
error."  "Consequently  none  of  the  three  evaluation  methods 
evaluated  here  exhibited  the  psychometric  qualities  desired  of  a 
procedure  that  will  serve  as  the  governing  criterion  in  pay 
classification  decisions" ( p.  146). 

Job  Evaluation  Techniques 

Gomez-Mejia  et  al  (1982)  rated  management  positions  using 
>cven  job  evaluations  systems.  Similar  ratings  were  reported 
across  methods.  "It  appears  that,  given  a  common  job-analysis 
data  base  and  data-collection  tool,  various  methods  used  to 
transform  data  into  a  grade  prediction  can  yield  essentially 
■■cmparable  results. "(p.  806).  Madigan  (1985)  reported  that  the 

same  job  rating  decisions  were  reached  across  three  job  evalu- 
•twin  methods.  Consequently,  both  Gomez-Mejia  et  al  (  1982)  and 
■•m  Irian  (1985)  found  that  job  evaluation  method  apparently  does 
uc«-  contribute  to  job  evaluation  bias. 

Factors/Scales 

As  noted  earlier,  Doverspike  et  al  (1983)  found  little 
•ai lance  due  to  raters.  Scales  and  jobs  and  their  interactions 
i i oduced  most  of  the  variance  reported.  Doverspike  et  al  also 
■aiculated  confidence  intervals  for  scales  and  jobs  and  found 
'hem  to  be  relatively  large.  To  the  authors,  this  result  sug¬ 
gests  that  a  replication  of  the  study  or  the  use  of  different 
jobs  or  scales  could  produce  different  results.  However,  the 
’itorature  suggests  that  four  factors  (skill,  effort,  responsi¬ 
bility,  and  working  conditions)  do  seem  to  be  reliable  across 
studies . 

Those  results  seem,  to  indicate  that  the  popular  job 
> •  r.t  1  tat ion  methods  have  the  capability  to  distinguish  between 
jobs  and  that  factors/scales  do  measure  different  dimensions, 
'oitainly,  one  would  want  a  job  evaluation  method  both  to  differ¬ 
entiate  between  jobs  and  to  have  independent  scales. 

ro'.’ct  spike  and  Barrett  (1984  )  conducted  a  more  detailed 
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analysis  of  the  impact  of  scales  as  they  relate  to  possible  sex 
bias  in  job  evaluation  results.  While  they  did  find  high  rater 
reliability,  they  also  reported  that  reliabilities  across  scales 
were  not  the  same.  Moreover,  (1)  the  correlations  between  scale 
scores  and  total  scores  varied  across  sexes;  (2)  a  factor 
analysis  produced  a  different  set  of  factors  for  males  versus 
females;  and  (3)  partial  correlations  between  scale  totals  and 
sex  group  scores  produced  some  scales  that  favored  males  and  some 
that  favored  females.  Their  results  are  particularly  noteworthy 
it  that,  while  raters  provided  reliable  results,  some  scales 
i • ^ved  to  be  biased  toward  one  sex  or  the  other. 

Worthy  of  separate  mention  is  the  interpretation  provided  by 
Mover  spike  and  Barrett  ( 1 ° 8 4 )  for  scale-sex  interaction.  They 
mqgested  that  for  male  sex-typed  jobs  complexity  of  interactions 
with  things  was  associated  with  higher  skill  demands  while  for 
female  sex-typed  jobs  greater  interaction  with  people  was 
associated  with  lower  skill  demands.  "Thus,  the  worth  or  meaning 
of  interactions  with  people  and  things  differed  for  male  and 
female  sex-typed  jobs" ( p . 657 ) . 

Literature  Summary 

Obviously,  the  job  evaluation  methods  that  have  seemingly 
-wood  the  test  of  time  are  now  coming  under  closer  scrutiny. 

Rater  reliability  m«/  be  acceptable,  but  classification  hit  rates 
may  be  unacceptable.  Moreover,  traditional  methods  of  measuring 
i eliability  may  overstate  rater  reliability.  Also,  the  concept 
■  f  what  is  an  acceptable  error  rate  in  classification  has  not 
been  addressed. 

The  factors  that  can  be  used  in  job  evaluation  has  reached 
some  consensus;  skill,  effort,  responsibility,  and  working 
conditions.  The  factors  themselves  do  not  appear  to  be  biased. 
However,  subfactors  (i.  e.,  scales)  may  not  produce  similar 
results  for  all  jobs.  Scales  that  emphasize  interacting  with 
things  may  favor  the  job  evaluation  of  male  jobs  while  scales 
that  emphasize  interacting  with  people  may  produce  lower  scores 
for  female  jobs.  Obviously,  if  such  scales  are  not  equally 
csented  m  a  job  evaluation  method,  bias  would  result. 

Implications  of  Job  Evaluation  Research  for 
.Job  Analysis  As  Conducted  in  the  Armed  Services 
The  natural  question  arises  as  to  why  research  on  civilian 
i  lb  evaluation  methods  would  be  of  interest  to  the  Armed 
Cervices.  Perhaps  the  most  compelling  reason  is  that  DoD  has 
always  stated  that  it  supports  Affirmative  Action  and  Equal 
Employment  Opportunity  programs  in  spirit  regardless  of  whether 
-i  not  DoD  is  actually  included  in  Civil  Rights  legislation, 
obviously,  comparable  worth  is  an  ever  expanding  part  of  the 
■ivil  rights  movement.  Given  that  job  evaluation  and  its  impact 
-  n  comparable  worth  is  coming  under  closer  scrutiny  in  the 
civilian  sector,  then  che  Armed  Services  should  also  be  inter¬ 
ested  in  these  issues.  The  fact  that  women  are  playing  an  ever 
•xpanding  role  in  the  Armed  Services  only  adds  to  this  argument. 

‘'mother  reason  is  that  the  Armed  Services  make  extensive  use 
of  pjb  analysis  data.  An  interesting  trend  of  the  job  evaluation 
research  rs  that  it  is  leading  more  and  more  to  the  issues  of  the 
"alidity  and  reliability  of  job  descriptions  and  the  underlying 
itta,  often  iob  analysis  data,  that  generate  these  descriptions. 


0:  ,  as  stated  by  Gomez-Mejia  et  al,  "...  the  instrument  that  is 
used  m  gathering  job  analysis  data  is  a  critical  element  m 
building  a  valid  and  practical  job  evaluation  system" (p.  806). 

There  are  other  reasons  why  the  Armed  Services  should  be 
interested  m  the  results  of  job  evaluation  research.  First, 
within  the  Air  Force  for  example,  promotion  rates  for  enlisted 
personnel  are  heavily  influenced  by  the  result:  on  skill  know¬ 
ledge  tests,  tests  whose  questions  are  directly  linked  to  the 
r  isk  inventories  generated  m  the  job  analysis  process.  Second, 
n  lies i pt. icns  are  an  important  part  of  the  military  personnel 
cincture.  Since  the  on-going  research  j.n  job  evaluation  is 
heading  towards  a  more  definitive  study  of  job  descriptions,  the 
military  should  also  be  interested  m  any  validity  and  reiiabi- 
.  ity  research  done  on  job  descriptions.  Third,  we  are  beginning 
to  witness  the  inclusion  of  corresponding  civilian  job  incumbents 
m  job  surveys  of  military  specialties.  Job  evaluation  research 
would  be  of  interest  to  GS  employees.  Fourth,  from  an  academic 
"lew,  the  military  personnel  R  &  D  community  should  be  interested 
:n  what  the  current  developments  are  in  the  civilian  sector  both 
*  o  stay  abreast  of  current  developments  and  also  to  have  the 
opportunity  to  add  to  the  investigation  of  the  issues.  Related 
to  this  fourth  reason  is  the  uniqueness  of  the  military  environ¬ 
ment.  Since  the  military  does  not  have  the  usual  wage  contam¬ 
ination  1  found  in  civilian  jobs,  it  may  be  able  to  conduct  purer 
research  than  is  possible  in  the  civilian  community. 

Suggestions  as  to  the  direction  military  research  should  take 
start  with  the  job  analysis  system  {CODAP)  because  the  task 
inventories  used  in  the  CODAP  system  are  the  basis  for  much  of 
the  military  classification  system.  The  first  suggestion  is 
to  re-analyze  existing  CODAP  data  using  recently  developed 
measurement  techniques.  For  example,  generalizability  analysis 
jou ] d  be  conducted  on  job  specialties  which  contain  large  numbers 
of  female  job  incumbents.  This  analysis  could  take  the  form  of 
treating  individual  tasks  as  job  evaluation  factor  scales  and 
determining  the  proportion  of  variance  attributable  to  tasks, 
■lies'.,  factors,  ox  sex  group.  Partial  correlation  could  also  be 
u v  i  to  link  task  inventory  responses  to  group  membership.  If 
the  task  inventories  prove  to  be  non-biased  in  terms  of  sex 
I  inference,  then  the  next  step  would  be  to  measure  the  validity 
i  existing  job  descriptions  by  having  judges  independently 
vnluate  jobs  based  on  the  current  job  description  and  based  on 
t ne  results  of  the  underlying  task  inventory. 

Given  that  job  evaluation  research  has  reported  that  there  is 
i  job  by  scale  variance  factor,  task  factor  ratings,  such  as  Job 
Difficulty  and  Training  Emphasis,  should  be  investigated  as  well 
the  number  of  tasks  per  job  factor.  These  investigations 
-ho  ild  study  the  impac:  of  job  factors  on  job  structuring  in 
general  as  well  as  the  impact,  for  example,  of  the  job  difficulty 
i a  tings  received  by  male  and  female  job  incumbents. 

The  basic  research  goal  should  be  to  develop  methodologies 
■  can  be  used  both  to  validate  a  job  analysis  task  inventory 
-uid  also  to  produce  and  validate  subsequent  job  descriptions.  In 
addition,  the  Armed  Services  personnel  R  &  D  community  should 
ds)  be  able  to  provide  more  empirical  data  on  job  evaluation 
bissif ication  level  confidence  intervals  and  standard  errors  of 
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measurement,  two  stated  needs  from  recent  research. 

The  question  is  not  whether  or  not  current  evaluation  tech¬ 
niques  are  working  but  whether  or  not  they  are  biased.  Current 
research  suggests  that  there  is  bias  and  that  this  bias  begins  m 
either  the  job  description  01  the  data  (often  job  analysis  data) 
i nat  generates  the  job  description.  Given  the  Armed  Service's 
interest  in  job  analysis,  it  should  join  in  this  research.  The 
fact  that  there  has  not  been  much  job  analysis  research  done  of 
’ate  by  the  military  is  also  not  an  issue;  there  had  not  been 
,uch  job  evaluation  research  done  prior  to  1979  either.  The 
issue  is  simply  that,  just  as  we  need  measures  of  validity  of  any 
assessment  technique,  we  need  to  develop  measures  of  the  validity 
of  both  (1)  the  task  inventories  used  m  job  analysis  and  (2)  the 
job  evaluation  methods  themselves.  The  Armed  Services  can  play  a 
major  role  in  this  research. 
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